To see the other types of publications on this topic, follow the link: 3D-Convolutional Neural Network (3D-CNN).

Dissertations / Theses on the topic '3D-Convolutional Neural Network (3D-CNN)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic '3D-Convolutional Neural Network (3D-CNN).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Rochford, Matthew. "Visual Speech Recognition Using a 3D Convolutional Neural Network." DigitalCommons@CalPoly, 2019. https://digitalcommons.calpoly.edu/theses/2109.

Full text
Abstract:
Main stream automatic speech recognition (ASR) makes use of audio data to identify spoken words, however visual speech recognition (VSR) has recently been of increased interest to researchers. VSR is used when audio data is corrupted or missing entirely and also to further enhance the accuracy of audio-based ASR systems. In this research, we present both a framework for building 3D feature cubes of lip data from videos and a 3D convolutional neural network (CNN) architecture for performing classification on a dataset of 100 spoken words, recorded in an uncontrolled envi- ronment. Our 3D-CNN architecture achieves a testing accuracy of 64%, comparable with recent works, but using an input data size that is up to 75% smaller. Overall, our research shows that 3D-CNNs can be successful in finding spatial-temporal features using unsupervised feature extraction and are a suitable choice for VSR-based systems.
APA, Harvard, Vancouver, ISO, and other styles
2

Castelli, Filippo Maria. "3D CNN methods in biomedical image segmentation." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/18796/.

Full text
Abstract:
A definite trend in Biomedical Imaging is the one towards the integration of increasingly complex interpretative layers to the pure data acquisition process. One of the most interesting and looked-forward goals in the field is the automatic segmentation of objects of interest in extensive acquisition data, target that would allow Biomedical Imaging to look beyond its use as a purely assistive tool to become a cornerstone in ambitious large-scale challenges like the extensive quantitative study of the Human Brain. In 2019 Convolutional Neural Networks represent the state of the art in Biomedical Image segmentation and scientific interests from a variety of fields, spacing from automotive to natural resource exploration, converge to their development. While most of the applications of CNNs are focused on single-image segmentation, biomedical image data -being it MRI, CT-scans, Microscopy, etc- often benefits from three-dimensional volumetric expression. This work explores a reformulation of the CNN segmentation problem that is native to the 3D nature of the data, with particular interest to the applications to Fluorescence Microscopy volumetric data produced at the European Laboratories for Nonlinear Spectroscopy in the context of two different large international human brain study projects: the Human Brain Project and the White House BRAIN Initiative.
APA, Harvard, Vancouver, ISO, and other styles
3

Liu, Ruixu. "Attention Based Temporal Convolutional Neural Network for Real-time 3D Human Pose Reconstruction." University of Dayton / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=dayton157546836015948.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Broyelle, Antoine. "Automated Pulmonary Nodule Detection on Computed Tomography Images with 3D Deep Convolutional Neural Network." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-231930.

Full text
Abstract:
Object detection on natural images has become a single-stage end-to-end process thanks to recent breakthroughs on deep neural networks. By contrast, automated pulmonary nodule detection is usually a three steps method: lung segmentation, generation of nodule candidates and false positive reduction. This project tackles the nodule detection problem with a single stage modelusing a deep neural network. Pulmonary nodules have unique shapes and characteristics which are not present outside of the lungs. We expect the model to capture these characteristics and to only focus on elements inside the lungs when working on raw CT scans (without the segmentation). Nodules are small, distributed and infrequent. We show that a well trained deep neural network can spot relevantfeatures and keep a low number of region proposals without any extra preprocessing or post-processing. Due to the visual nature of the task, we designed a three-dimensional convolutional neural network with residual connections. It was inspired by the region proposal network of the Faster R-CNN detection framework. The evaluation is performed on the LUNA16 dataset. The final score is 0.826 which is the average sensitivity at 0.125, 0.25, 0.5, 1, 2, 4, and 8 false positives per scan. It can be considered as an average score compared to other submissions to the challenge. However, the solution described here was trained end-to-end and has fewer trainable parameters.<br>Objektdetektering i naturliga bilder har reducerates till en enstegs process tack vare genombrott i djupa neurala nätverk. Automatisk detektering av pulmonella nodulärer är vanligtvis ett trestegsproblem: segmentering av lunga, generering av nodulärkandidater och reducering av falska positiva utfall. Det här projektet tar sig an nodulärdetektering med en enstegsmodell med hjälp av ett djupt neuralt nätverk. Pulmonella nodulärer har unika karaktärsdrag som inte finns utanför lungorna. Modellen förväntas fånga dessa drag och enbart fokusera på element inuti lungorna när den arbetar med datortomografibilder. Nodulärer är små och glest föredelade. Vi visar att ett vältränat nätverk kan finna relevanta särdrag samt föreslå ett lågt antal intresseregioner utan extra för- eller efter- behandling. På grund av den visuella karaktären av det här problemet så designade vi ett tredimensionellt s.k. convolutional neural network med residualkopplingar. Projektet inspirerades av Faster R-CNN, ett nätverk som utmärker sig i sin förmåga att detektera intresseregioner. Nätverket utvärderades på ett dataset vid namn LUNA16. Det slutgiltiga nätverket testade 0.826, vilket är genomsnittlig sensitivitet vid 0.125, 0.25, 0.5, 1, 2, 4, och 8 falska positiva per utvärdering. Detta kan anses vara genomsnittligt jämfört med andra deltagande i tävlingen, men lösningen som föreslås här är en enstegslösning som utför detektering från början till slut och har färre träningsbara parametrar.<br>La détection d’objets sur les images naturelles est devenue au fil du temps un processus réalisé de bout en bout en une seule étape grâce aux évolutions récentes des architectures de neurones artificiels profonds. En revanche, la détection automatique de nodules pulmonaires est généralement un processus en trois étapes : la segmentation des poumons (pré-traitement), la génération de zones d’intérêt (modèle) et la réduction des faux positifs (post-traitement). Ce projet s’attaque à la détection des nodules pulmonaires en une seule étape avec un réseau profond de neurones artificiels. Les nodules pulmonaires ont des formes et des structures uniques qui ne sont pas présentes en dehors de cet organe. Nous nous attendons à ce qu’un modèle soit capable de capturer ces caractéristiques et de se focaliser uniquement sur les éléments à l’intérieur des poumons alors même qu’il reçoit des images brutes (sans segmentation des poumons). Les nodules sont petits, peu fréquents et répartis aléatoirement. Nous montrons qu’un modèle correctement entraîné peut repérer les éléments caractéristiques des nodules et générer peu de localisations sans pré-traitement ni post-traitement. Du fait de la nature visuelle de la tâche, nous avons développé un réseau neuronal convolutif tridimensionnel. L’architecture utilisée est inspirée du méta-algorithme de détection Faster R-CNN. L’évaluation est réalisée avec le jeu de données du challenge LUNA16. Le score final est de 0.826 qui représente la sensibilité moyenne pour les valeurs de 0.125, 0.25, 0.5, 1, 2, 4 et 8 faux positifs par scanner. Il peut être considéré comme un score moyen comparé aux autres contributions du challenge. Cependant, la solution décrite montre la faisabilité d’un modèle en une seule étape, entraîné de bout en bout. Le réseau comporte moins de paramètres que la majorité des solutions.
APA, Harvard, Vancouver, ISO, and other styles
5

Jackman, Simeon. "Football Shot Detection using Convolutional Neural Networks." Thesis, Linköpings universitet, Institutionen för medicinsk teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157438.

Full text
Abstract:
In this thesis, three different neural network architectures are investigated to detect the action of a shot within a football game using video data. The first architecture uses con- ventional convolution and pooling layers as feature extraction. It acts as a baseline and gives insight into the challenges faced during shot detection. The second architecture uses a pre-trained feature extractor. The last architecture uses three-dimensional convolution. All these networks are trained using short video clips extracted from football game video streams. Apart from investigating network architectures, different sampling methods are evaluated as well. This thesis shows that amongst the three evaluated methods, the ap- proach using MobileNetV2 as a feature extractor works best. However, when applying the networks to a video stream there are a multitude of challenges, such as false positives and incorrect annotations that inhibit the potential of detecting shots.
APA, Harvard, Vancouver, ISO, and other styles
6

Pedrazzini, Filippo. "3D Position Estimation using Deep Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254876.

Full text
Abstract:
The estimation of the 3D position of an object is one of the most important topics in the computer vision field. Where the final aim is to create automated solutions that can localize and detect objects from images, new high-performing models and algorithms are needed. Due to lack of relevant information in the single 2D images, approximating the 3D position can be considered a complex problem. This thesis describes a method based on two deep learning models: the image net and the temporal net that can tackle this task. The former is a deep convolutional neural network with the intention to extract meaningful features from the images, while the latter exploits the temporal information to reach a more robust prediction. This solution reaches a better Mean Absolute Error compared to already existing computer vision methods on different conditions and configurations. A new data-driven pipeline has been created to deal with 2D videos and extract the 3D information of an object. The same architecture can be generalized to different domains and applications.<br>Uppskattning av 3D-positionen för ett objekt är ett viktigt område inom datorseende. Då det slutliga målet är att skapa automatiserade lösningar som kan lokalisera och upptäcka objekt i bilder, behövs nya, högpresterande modeller och algoritmer. Bristen på relevant information i de enskilda 2D-bilderna gör att approximering av 3D-positionen blir ett komplext problem. Denna uppsats beskriver en metod baserad på två djupinlärningsmodeller: image net och temporal net. Den förra är ett djupt nätverk som kan extrahera meningsfulla egenskaper från bilderna, medan den senare utnyttjar den tidsmässiga informationen för att kunna göra mer robusta förutsägelser. Denna lösning erhåller ett lägre genomsnittligt absolut fel jämfört med existerande metoder, under olika villkor och konfigurationer. En ny datadriven arkitektur har skapats för att hantera 2D-videoklipp och extrahera 3D-informationen för ett objekt. Samma arkitektur kan generaliseras till olika domäner och applikationer.
APA, Harvard, Vancouver, ISO, and other styles
7

Fucili, Mattia. "3D object detection from point clouds with dense pose voters." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17616/.

Full text
Abstract:
Il riconoscimento di oggetti è sempre stato un compito sfidante per la Computer Vision. Trova applicazione in molti campi, principalmente nell’industria, come ad esempio per permettere ad un robot di trovare gli oggetti da afferrare. Negli ultimi decenni tali compiti hanno trovato nuovi modi di essere raggiunti grazie alla riscoperta delle Reti Neurali, in particolare le Reti Neurali Convoluzionali. Questo tipo di reti ha raggiunto ottimi risultati in molte applicazioni per il riconoscimento e la classificazione degli oggetti. La tendenza, ora, `e quella di utilizzare tali reti anche nell’industria automobilistica per cercare di rendere reale il sogno delle automobili che guidano da sole. Ci sono molti lavori importanti sul riconoscimento delle auto dalle immagini. In questa tesi presentiamo la nostra architettura di Rete Neurale Convoluzionale per il riconoscimento di automobili e la loro posizione nello spazio, utilizzando solo input lidar. Salvando le informazioni riguardanti le bounding box attorno all’auto a livello del punto ci assicura una buona previsione anche in situazioni in cui le automobili sono occluse. I test vengono eseguiti sul dataset più utilizzato per il riconoscimento di automobili e pedoni nelle applicazioni di guida autonoma.
APA, Harvard, Vancouver, ISO, and other styles
8

Galan, Martínez Silvia 1992. "Chromatin organization : Meta-analysis for the identification and classification of structural patterns." Doctoral thesis, Universitat Pompeu Fabra, 2020. http://hdl.handle.net/10803/670278.

Full text
Abstract:
El desenvolupament de tècniques experimentals basades en la captura de la conformació genòmica (3C), han aportat informació rellevant sobre l’estructura del genoma. En particular el Hi-C, un derivat del 3C, el qual s’ha convertit en una tècnica estàndard per l’estudi de l’estructura 3D del genoma i la seva implicació biològica i funcional. Malgrat tot, existeix una manca de estàndards per el seu anàlisi i interpretació. En aquesta tesi, desenvolupem una xarxa neuronal artificial, Metawaffle, capaç de classificar patrons estructurals sense informació prèvia, que ens permet examinar la capacitat de CTCF de formar bucles de cromatina i identificar la seva signatura epigenètica. La identificació de bucles de cromatina ens permet generar una xarxa neuronal convolutiva, LOOPbit, per la seva detecció de novo en matrius de contacte Hi-C. Finalment, exposem una eina bioinformàtica, CHESS, per la comparació de mapes de contactes i la identificació d’estructures diferencials, com TADs, ratlles o bucles.<br>The development of High-throughput Chromosome Conformation Capture (3C) experiments, provided valuable information about genome architecture. Particularly Hi-C, a 3C derivative, which has become to be the standard technique to study 3D chromatin organization and its biological and functional implications. Nonetheless, exist a lack of gold standard for its bioinformatics analysis and interpretation. In this thesis, we develop an artificial neural network, Metawaffle, which is able to classify structural patterns without prior information. This allow the examination of the ability of CTCF to form chromatin loops and identify its epigenetic signature. The identification of chromatin loops permit the generation of a convolutional neural network, LOOPbit, for de novo chromatin loops detection in Hi-C contact matrices. Finally, we present a bioinformatic tool, CHESS, for the comparison of contact matrices and the specific identification and extraction of differential features, such as TADs, stripes or loops.
APA, Harvard, Vancouver, ISO, and other styles
9

Li, Vladimir. "Evaluation of the CNN Based Architectures on the Problem of Wide Baseline Stereo Matching." Thesis, KTH, Datorseende och robotik, CVAP, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-192476.

Full text
Abstract:
Three-dimensional information is often used in robotics and 3D-mapping. There exist several ways to obtain a three-dimensional map. However, the time of flight used in the laser scanners or the structured light utilized by Kinect-like sensors sometimes are not sufficient. In this thesis, we investigate two CNN based stereo matching methods for obtaining 3D-information from a grayscaled pair of rectified images.While the state-of-the-art stereo matching method utilize a Siamese architecture, in this project a two-channel and a two stream network are trained in an attempt to outperform the state-of-the-art. A set of experiments were performed to achieve optimal hyperparameters. By changing one parameter at the time, the networks with architectures mentioned above are trained. After a completed training the networks are evaluated with two criteria, the error rate, and the runtime.Due to time limitations, we were not able to find optimal learning parameters. However, by using settings from [17] we train a two-channel network that performed almost on the same level as the state-of-the-art. The error rate on the test data for our best architecture is 2.64% while the error rate for the state-of-the-art Siamese network is 2.62%. We were not able to achieve better performance than the state-of-the-art, but we believe that it is possible to reduce the error rate further. On the other hand, the state-of-the-art Siamese stereo matching network is more efficient and faster during the disparity estimation. Therefore, if the time efficiency is prioritized, the Siamese based network should be considered.
APA, Harvard, Vancouver, ISO, and other styles
10

Rydén, Anna, and Amanda Martinsson. "Evaluation of 3D motion capture data from a deep neural network combined with a biomechanical model." Thesis, Linköpings universitet, Institutionen för medicinsk teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176543.

Full text
Abstract:
Motion capture has in recent years grown in interest in many fields from both game industry to sport analysis. The need of reflective markers and expensive multi-camera systems limits the business since they are costly and time-consuming. One solution to this could be a deep neural network trained to extract 3D joint estimations from a 2D video captured with a smartphone. This master thesis project has investigated the accuracy of a trained convolutional neural network, MargiPose, that estimates 25 joint positions in 3D from a 2D video, against a gold standard, multi-camera Vicon-system. The project has also investigated if the data from the deep neural network can be connected to a biomechanical modelling software, AnyBody, for further analysis. The final intention of this project was to analyze how accurate such a combination could be in golf swing analysis. The accuracy of the deep neural network has been evaluated with three parameters: marker position, angular velocity and kinetic energy for different segments of the human body. MargiPose delivers results with high accuracy (Mean Per Joint Position Error (MPJPE) = 1.52 cm) for a simpler movement but for a more advanced motion such as a golf swing, MargiPose achieves less accuracy in marker distance (MPJPE = 3.47 cm). The mean difference in angular velocity shows that MargiPose has difficulties following segments that are occluded or has a greater motion, such as the wrists in a golf swing where they both move fast and are occluded by other body segments. The conclusion of this research is that it is possible to connect data from a trained CNN with a biomechanical modelling software. The accuracy of the network is highly dependent on the intention of the data. For the purpose of golf swing analysis, this could be a great and cost-effective solution which could enable motion analysis for professionals but also for interested beginners. MargiPose shows a high accuracy when evaluating simple movements. However, when using it with the intention of analyzing a golf swing in i biomechanical modelling software, the outcome might be beyond the bounds of reliable results.
APA, Harvard, Vancouver, ISO, and other styles
11

Slunský, Tomáš. "Vícetřídá segmentace 3D lékařských dat pomocí hlubokého učení." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-400891.

Full text
Abstract:
Master's thesis deals with multiclass image segmentation using convolutional neural networks. The theoretical part of the Master's thesis focuses on image segmentation. There are basics principles of neural networks and image segmentation with more types of approaches. In practical part the Unet architecture is choosen and is described for image segmentation more. U-net was applied for medicine dataset. There is processing procedure which is more described for image proccesing of three-dimmensional data. There are also methods for data preproccessing which were applied for image multiclass segmentation. Final part of current master's thesis evaluates results.
APA, Harvard, Vancouver, ISO, and other styles
12

Papadopoulos, Georgios. "Towards a 3D building reconstruction using spatial multisource data and computational intelligence techniques." Thesis, Limoges, 2019. http://www.theses.fr/2019LIMO0084/document.

Full text
Abstract:
La reconstruction de bâtiments à partir de photographies aériennes et d’autres données spatiales urbaines multi-sources est une tâche qui utilise une multitude de méthodes automatisées et semi-automatisées allant des processus ponctuels au traitement classique des images et au balayage laser. Dans cette thèse, un système de relaxation itératif est développé sur la base de l'examen du contexte local de chaque bord en fonction de multiples sources d'entrée spatiales (masques optiques, d'élévation, d'ombre et de feuillage ainsi que d'autres données prétraitées, décrites au chapitre 6). Toutes ces données multisource et multirésolution sont fusionnées de manière à extraire les segments de ligne probables ou les arêtes correspondant aux limites des bâtiments. Deux nouveaux sous-systèmes ont également été développés dans cette thèse. Ils ont été conçus dans le but de fournir des informations supplémentaires, plus fiables, sur les contours des bâtiments dans une future version du système de relaxation proposé. La première est une méthode de réseau de neurones à convolution profonde (CNN) pour la détection de frontières de construction. Le réseau est notamment basé sur le modèle SRCNN (Dong C. L., 2015) de super-résolution à la pointe de la technologie. Il accepte des photographies aériennes illustrant des données de zones urbaines densément peuplées ainsi que leurs cartes d'altitude numériques (DEM) correspondantes. La formation utilise trois variantes de cet ensemble de données urbaines et vise à détecter les contours des bâtiments grâce à une nouvelle cartographie hétéroassociative super-résolue. Une autre innovation de cette approche est la conception d'une couche de perte personnalisée modifiée appelée Top-N. Dans cette variante, l'erreur quadratique moyenne (MSE) entre l'image de sortie reconstruite et l'image de vérité de sol (GT) fournie des contours de bâtiment est calculée sur les 2N pixels de l'image avec les valeurs les plus élevées. En supposant que la plupart des N pixels de contour de l’image GT figurent également dans les 2N pixels supérieurs de la reconstruction, cette modification équilibre les deux catégories de pixels et améliore le comportement de généralisation du modèle CNN. Les expériences ont montré que la fonction de coût Top-N offre des gains de performance par rapport à une MSE standard. Une amélioration supplémentaire de la capacité de généralisation du réseau est obtenue en utilisant le décrochage. Le deuxième sous-système est un réseau de convolution profonde à super-résolution, qui effectue un mappage associatif à entrée améliorée entre les images d'entrée à basse résolution et à haute résolution. Ce réseau a été formé aux données d’altitude à basse résolution et aux photographies urbaines optiques à haute résolution correspondantes. Une telle différence de résolution entre les images optiques / satellites optiques et les données d'élévation est souvent le cas dans les applications du monde réel<br>Building reconstruction from aerial photographs and other multi-source urban spatial data is a task endeavored using a plethora of automated and semi-automated methods ranging from point processes, classic image processing and laser scanning. In this thesis, an iterative relaxation system is developed based on the examination of the local context of each edge according to multiple spatial input sources (optical, elevation, shadow &amp; foliage masks as well as other pre-processed data as elaborated in Chapter 6). All these multisource and multiresolution data are fused so that probable line segments or edges are extracted that correspond to prominent building boundaries.Two novel sub-systems have also been developed in this thesis. They were designed with the purpose to provide additional, more reliable, information regarding building contours in a future version of the proposed relaxation system. The first is a deep convolutional neural network (CNN) method for the detection of building borders. In particular, the network is based on the state of the art super-resolution model SRCNN (Dong C. L., 2015). It accepts aerial photographs depicting densely populated urban area data as well as their corresponding digital elevation maps (DEM). Training is performed using three variations of this urban data set and aims at detecting building contours through a novel super-resolved heteroassociative mapping. Another innovation of this approach is the design of a modified custom loss layer named Top-N. In this variation, the mean square error (MSE) between the reconstructed output image and the provided ground truth (GT) image of building contours is computed on the 2N image pixels with highest values . Assuming that most of the N contour pixels of the GT image are also in the top 2N pixels of the re-construction, this modification balances the two pixel categories and improves the generalization behavior of the CNN model. It is shown in the experiments, that the Top-N cost function offers performance gains in comparison to standard MSE. Further improvement in generalization ability of the network is achieved by using dropout.The second sub-system is a super-resolution deep convolutional network, which performs an enhanced-input associative mapping between input low-resolution and high-resolution images. This network has been trained with low-resolution elevation data and the corresponding high-resolution optical urban photographs. Such a resolution discrepancy between optical aerial/satellite images and elevation data is often the case in real world applications. More specifically, low-resolution elevation data augmented by high-resolution optical aerial photographs are used with the aim of augmenting the resolution of the elevation data. This is a unique super-resolution problem where it was found that many of -the proposed general-image SR propositions do not perform as well. The network aptly named building super resolution CNN (BSRCNN) is trained using patches extracted from the aforementioned data. Results show that in comparison with a classic bicubic upscale of the elevation data the proposed implementation offers important improvement as attested by a modified PSNR and SSIM metric. In comparison, other proposed general-image SR methods performed poorer than a standard bicubic up-scaler.Finally, the relaxation system fuses together all these multisource data sources comprising of pre-processed optical data, elevation data, foliage masks, shadow masks and other pre-processed data in an attempt to assign confidence values to each pixel belonging to a building contour. Confidence is augmented or decremented iteratively until the MSE error fails below a specified threshold or a maximum number of iterations have been executed. The confidence matrix can then be used to extract the true building contours via thresholding
APA, Harvard, Vancouver, ISO, and other styles
13

Skácel, Dalibor. "Navigace pomocí hlubokých konvolučních sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-386026.

Full text
Abstract:
In this thesis I deal with the problem of navigation and autonomous driving using convolutional neural networks. I focus on the main approaches utilizing sensory inputs described in literature and the theory of neural networks, imitation and reinforcement learning. I also discuss the tools and methods applicable to driving systems. I created two deep learning models for autonomous driving in simulated environment. These models use the Dataset Aggregation and Deep Deterministic Policy Gradient algorithms. I tested the created models in the TORCS car racing simulator and compared the result with available sources.
APA, Harvard, Vancouver, ISO, and other styles
14

Pllashniku, Edlir, and Zolal Stanikzai. "Normalization of Deep and Shallow CNNs tasked with Medical 3D PET-scans : Analysis of technique applicability." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-45521.

Full text
Abstract:
There has in recent years been interdisciplinary research on utilizing machine learning for detecting and classifying neurodegenerative disorders with the sole goal of outperforming state-of-the-art models in terms of metrics such as accuracy, specificity, and sensitivity. Specifically, these studies have been conducted using existing networks on ”novel” methods of pre-processing data or by developing new convolutional neural networks. As of now, no work has looked into how different normalization techniques affect a deep or shallow convolutional neural network in terms of numerical stability, its performance, explainability, and interpretability. This work delves into what normalization technique is most suitable for deep and shallow convolutional neural networks. Two baselines were created, one shallow and one deep, and applied eight different normalization techniques to these model architectures. Conclusions were drawn based on our analysis of numerical stability, performance (metrics), and methods of Explainable Artificial Intelligence. Our findings indicate that normalization techniques affect models differently regarding the mentioned aspects of our analysis, especially numerical stability and explainability. Moreover, we show that there should indeed be a preference to select one method over the other in future studies of this interdisciplinary field.
APA, Harvard, Vancouver, ISO, and other styles
15

Ekström, Marcus. "Road Surface Preview Estimation Using a Monocular Camera." Thesis, Linköpings universitet, Datorseende, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-151873.

Full text
Abstract:
Recently, sensors such as radars and cameras have been widely used in automotives, especially in Advanced Driver-Assistance Systems (ADAS), to collect information about the vehicle's surroundings. Stereo cameras are very popular as they could be used passively to construct a 3D representation of the scene in front of the car. This allowed the development of several ADAS algorithms that need 3D information to perform their tasks. One interesting application is Road Surface Preview (RSP) where the task is to estimate the road height along the future path of the vehicle. An active suspension control unit can then use this information to regulate the suspension, improving driving comfort, extending the durabilitiy of the vehicle and warning the driver about potential risks on the road surface. Stereo cameras have been successfully used in RSP and have demonstrated very good performance. However, the main disadvantages of stereo cameras are their high production cost and high power consumption. This limits installing several ADAS features in economy-class vehicles. A less expensive alternative are monocular cameras which have a significantly lower cost and power consumption. Therefore, this thesis investigates the possibility of solving the Road Surface Preview task using a monocular camera. We try two different approaches: structure-from-motion and Convolutional Neural Networks.The proposed methods are evaluated against the stereo-based system. Experiments show that both structure-from-motion and CNNs have a good potential for solving the problem, but they are not yet reliable enough to be a complete solution to the RSP task and be used in an active suspension control unit.
APA, Harvard, Vancouver, ISO, and other styles
16

Regia, Corte Fabiola. "Studio ed implementazione di un modello di Human Pose Estimation 3D. Analisi tecnica della posizione del corpo dell’atleta durante un match di Tennis." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text
Abstract:
Al giorno d’oggi, senza esserne troppo consapevoli, il Machine Learning sta entrando a far parte dei più svariati settori, professionali o privati che siano; dalla Classificazione in ambito agricolo alle auto a guida autonoma; dal Riconoscimento del parlato in ambito didattico all’individuazione di oggetti in un paesaggio, fino a giungere all’ambito sportivo, che sia individuale o di squadra, di livello amatoriale o professionistico. Ed è proprio in quest’ultimo ambito che si colloca questo progetto: tratteremo infatti l’utilizzo delle reti convoluzionali per la stima della posa umana in ambito sportivo, nello specifico in ambito tennistico.
APA, Harvard, Vancouver, ISO, and other styles
17

Serra, Sabina. "Deep Learning for Semantic Segmentation of 3D Point Clouds from an Airborne LiDAR." Thesis, Linköpings universitet, Datorseende, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-168367.

Full text
Abstract:
Light Detection and Ranging (LiDAR) sensors have many different application areas, from revealing archaeological structures to aiding navigation of vehicles. However, it is challenging to interpret and fully use the vast amount of unstructured data that LiDARs collect. Automatic classification of LiDAR data would ease the utilization, whether it is for examining structures or aiding vehicles. In recent years, there have been many advances in deep learning for semantic segmentation of automotive LiDAR data, but there is less research on aerial LiDAR data. This thesis investigates the current state-of-the-art deep learning architectures, and how well they perform on LiDAR data acquired by an Unmanned Aerial Vehicle (UAV). It also investigates different training techniques for class imbalanced and limited datasets, which are common challenges for semantic segmentation networks. Lastly, this thesis investigates if pre-training can improve the performance of the models. The LiDAR scans were first projected to range images and then a fully convolutional semantic segmentation network was used. Three different training techniques were evaluated: weighted sampling, data augmentation, and grouping of classes. No improvement was observed by the weighted sampling, neither did grouping of classes have a substantial effect on the performance. Pre-training on the large public dataset SemanticKITTI resulted in a small performance improvement, but the data augmentation seemed to have the largest positive impact. The mIoU of the best model, which was trained with data augmentation, was 63.7% and it performed very well on the classes Ground, Vegetation, and Vehicle. The other classes in the UAV dataset, Person and Structure, had very little data and were challenging for most models to classify correctly. In general, the models trained on UAV data performed similarly as the state-of-the-art models trained on automotive data.
APA, Harvard, Vancouver, ISO, and other styles
18

Gu, Dongfeng. "3D Densely Connected Convolutional Network for the Recognition of Human Shopping Actions." Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/36739.

Full text
Abstract:
In recent years, deep convolutional neural networks (CNNs) have shown remarkable results in the image domain. However, most of the neural networks in action recognition do not have very deep layer compared with the CNN in the image domain. This thesis presents a 3D Densely Connected Convolutional Network (3D-DenseNet) for action recognition that can have more than 100 layers without exhibiting performance degradation or overfitting. Our network expands Densely Connected Convolutional Networks (DenseNet) [32] to 3D-DenseNet by adding the temporal dimension to all internal convolution and pooling layers. The internal layers of our model are connected with each other in a feed-forward fashion. In each layer, the feature-maps of all preceding layers are concatenated along the last dimension and are used as inputs to all subsequent layers. We propose two different versions of 3D-DenseNets: general 3D-DenseNet and lite 3D-DenseNet. While general 3D-DenseNet has the same architecture as DenseNet, lite 3D-DenseNet adds a 3D pooling layer right after the first 3D convolution layer of general 3D-DenseNet to reduce the number of training parameters at the beginning so that we can reach a deeper network. We test on two action datasets: the MERL shopping dataset [69] and the KTH dataset [63]. Our experiment results demonstrate that our method performs better than the state-of-the-art action recognition method on the MERL shopping dataset and achieves a competitive result on the KTH dataset.
APA, Harvard, Vancouver, ISO, and other styles
19

Cronje, Frans. "Human action recognition with 3D convolutional neural networks." Master's thesis, University of Cape Town, 2015. http://hdl.handle.net/11427/15482.

Full text
Abstract:
Convolutional neural networks (CNNs) adapt the regular fully-connected neural network (NN) algorithm to facilitate image classification. Recently, CNNs have been demonstrated to provide superior performance across numerous image classification databases including large natural images (Krizhevsky et al., 2012). Furthermore, CNNs are more readily transferable between different image classification problems when compared to common alternatives. The extension of CNNs to video classification is simple and the rationale behind the components of the model are still applicable due to the similarity between image and video data. Previous CNNs have demonstrated good performance upon video datasets, however have not employed methods that have been recently developed and attributed improvements in image classification networks. The purpose of this research to build a CNN model that includes recently developed elements to present a human action recognition model which is up-to-date with current trends in CNNs and current hardware. Focus is applied to ensemble models and methods such as the Dropout technique, developed by Hinton et al. (2012) to reduce overfitting, and learning rate adaptation techniques. The KTH human action dataset is used to assess the CNN model, which, as a widely used benchmark dataset, facilitates the comparison between previous work performed in the literature. Three CNNs are built and trained to provide insight into design choices as well as allow the construction of an ensemble model. The final ensemble model achieved comparative performance to previous CNNs trained upon the KTH data. While the inclusion of new methods to the CNN model did not result in an improvement on previous models, the competitive result provides an alternative combination of architecture and components to other CNN models.
APA, Harvard, Vancouver, ISO, and other styles
20

Matteo, Lionel. "De l’image optique "multi-stéréo" à la topographie très haute résolution et la cartographie automatique des failles par apprentissage profond." Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4099.

Full text
Abstract:
Les failles sismogéniques sont la source des séismes. L'étude de leurs propriétés nous informe donc sur les caractéristiques des forts séismes qu'elles peuvent produire. Les failles sont des objets 3D qui forment des réseaux complexes incluant une faille principale et une multitude de failles et fractures secondaires qui "découpent" la roche environnante à la faille principale. Mon objectif dans cette thèse a été de développer des approches pour aider à étudier cette fracturation secondaire intense. Pour identifier, cartographier et mesurer les fractures et les failles dans ces réseaux, j'ai adressé deux défis :1) Les failles peuvent former des escarpements topographiques très pentus à la surface du sol, créant des "couloirs" ou des canyons étroits et profond où la topographie et donc, la trace des failles, peut être difficile à mesurer en utilisant des méthodologies standard (comme des acquisitions d'images satellites optiques stéréo et tri-stéréo). Pour répondre à ce défi, j'ai utilisé des acquisitions multi-stéréos avec différentes configurations (différents angles de roulis et tangage, différentes dates et modes d'acquisitions). Notre base de données constituée de 37 images Pléiades dans trois sites tectoniques différents dans l'Ouest américain (Valley of Fire, Nevada ; Granite Dells, Arizona ; Bishop Tuff, California) m'a permis de tester différentes configurations d'acquisitions pour calculer la topographie avec trois approches différentes. En utilisant la solution photogrammétrique open-source Micmac (IGN ; Rupnik et al., 2017), j'ai calculé la topographie sous la forme de Modèles Numériques de Surfaces (MNS) : (i) à partir de combinaisons de 2 à 17 images Pléiades, (ii) en fusionnant des MNS calculés individuellement à partir d'acquisitions stéréo et tri-stéréo, évitant alors l'utilisant d'acquisitions multi-dates et (iii) en fusionnant des nuages de points calculés à partir d'acquisitions tri-stéréos en suivant la méthodologie multi-vues développée par Rupnik et al. (2018). J’ai aussi combiné, dans une dernière approche (iv), des acquisitions tri-stéréos avec la méthodologie multi-vues stéréos du CNES/CMLA (CARS) développé par Michel et al. (2020), en combinant des acquisitions tri-stéréos. A partir de ces quatre approches, j'ai calculé plus de 200 MNS et mes résultats suggèrent que deux acquisitions tri-stéréos ou une acquisition stéréo combinée avec une acquisition tri-stéréo avec des angles de roulis opposés permettent de calculer les MNS avec la surface topographique la plus complète et précise.2) Couramment, les failles sont cartographiées manuellement sur le terrain ou sur des images optiques et des données topographiques en identifiant les traces curvilinéaires qu'elles forment à la surface du sol. Néanmoins, la cartographie manuelle demande beaucoup de temps ce qui limite notre capacité à produire cartographies et des mesures complètes des réseaux de failles. Pour s'affranchir de ce problème, j'ai adopté une approche d'apprentissage profond, couramment appelé un réseau de neurones convolutifs (CNN) - U-Net, pour automatiser l'identification et la cartographie des fractures et des failles dans des images optiques et des données topographiques. Volontairement, le modèle CNN a été entraîné avec une quantité modérée de fractures et failles cartographiées manuellement à basse résolution et dans un seul type d'images optiques (photographies du sol avec des caméras classiques). A partir d'un grand nombre de tests, j'ai sélectionné le meilleur modèle, MRef et démontre sa capacité à prédire des fractures et des failles précisément dans données optiques et topographiques de différents types et différentes résolutions (photographies prises au sol, avec un drone et par satellite). Le modèle MRef montre de bonnes capacités de généralisations faisant alors de ce modèle un bon outil pour cartographie rapidement et précisément des fractures et des failles dans des images optiques et des données topographiques<br>Seismogenic faults are the source of earthquakes. The study of their properties thus provides information on some of the properties of the large earthquakes they might produce. Faults are 3D features, forming complex networks generally including one master fault and myriads of secondary faults and fractures that intensely dissect the master fault embedding rocks. I aim in my thesis to develop approaches to help studying this intense secondary faulting/fracturing. To identify, map and measure the faults and fractures within dense fault networks, I have handled two challenges:1) Faults generally form steep topographic escarpments at the ground surface that enclose narrow, deep corridors or canyons, where topography, and hence fault traces, are difficult to measure using the available standard methods (such as stereo and tri-stereo of optical satellite images). To address this challenge, I have thus used multi-stéréo acquisitions with different configuration such as different roll and pitch angles, different date of acquisitions and different mode of acquisitions (mono and tri-stéréo). Our dataset amounting 37 Pléiades images in three different tectonic sites within Western USA (Valley of Fire, Nevada; Granite Dells, Arizona; Bishop Tuff, California) allow us to test different configuration of acquisitions to calculate the topography with three different approaches. Using the free open-source software Micmac (IGN ; Rupnik et al., 2017), I have calculated the topography in the form of Digital Surface Models (DSM): (i) with the combination of 2 to 17 Pleiades images, (ii) stacking and merging DSM built from individual stéréo or tri-stéréo acquisitions avoiding the use of multi-dates combinations, (iii) stacking and merging point clouds built from tri-stereo acquisitions following the multiview pipeline developped by Rupnik et al., 2018. We used the recent multiview stereo pipeling CARS (CNES/CMLA) developped by Michel et al., 2020 as a last approach (iv), combnining tri-stereo acquisitions. From the four different approaches, I have thus calculated more than 200 DSM and my results suggest that combining two tri-stéréo acquisitions or one stéréo and one tri-stéréo acquisitions with opposite roll angles leads to the most accurate DSM (with the most complete and precise topography surface).2) Commonly, faults are mapped manually in the field or from optical images and topographic data through the recognition of the specific curvilinear traces they form at the ground surface. However, manual mapping is time-consuming, which limits our capacity to produce complete representations and measurements of the fault networks. To overcome this problem, we have adopted a machine learning approach, namely a U-Net Convolutional Neural Network, to automate the identification and mapping of fractures and faults in optical images and topographic data. Intentionally, we trained the CNN with a moderate amount of manually created fracture and fault maps of low resolution and basic quality, extracted from one type of optical images (standard camera photographs of the ground surface). Based on the results of a number of performance tests, we select the best performing model, MRef, and demonstrate its capacity to predict fractures and faults accurately in image data of various types and resolutions (ground photographs, drone and satellite images and topographic data). The MRef predictions thus enable the statistical analysis of the fault networks. MRef exhibits good generalization capacities, making it a viable tool for fast and accurate extraction of fracture and fault networks from image and topographic data
APA, Harvard, Vancouver, ISO, and other styles
21

Christopoulos, Charitos Andreas. "Brain disease classification using multi-channel 3D convolutional neural networks." Thesis, Linköpings universitet, Statistik och maskininlärning, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-174329.

Full text
Abstract:
Functional magnetic resonance imaging (fMRI) technology has been used in the investigation of human brain functionality and assist in brain disease diagnosis. While fMRI can be used to model both spatial and temporal brain functionality, the analysis of the fMRI images and the discovery of patterns for certain brain diseases is still a challenging task in medical imaging. Deep learning has been used more and more in medical field in an effort to further improve disease diagnosis due to its effectiveness in discovering high-level features in images. Convolutional neural networks (CNNs) is a class of deep learning algorithm that have been successfully used in medical imaging and extract spatial hierarchical features. The application of CNNs in fMRI and the extraction of brain functional patterns is an open field for research. This project focuses on how fMRIs can be used to improve Autism Spectrum Disorders (ASD) detection and diagnosis with 3D resting-state functional MRI (rs-fMRI) images. ASDs are a range of neurodevelopment brain diseases that mostly affect social function. Some of the symptoms include social and communicating difficulties, and also restricted  and repetitive  behaviors. The  symptoms appear on early childhood and tend to develop in time thus an early diagnosis is required. Finding a proper model for identifying between ASD and healthy subject is a challenging task and involves a lot of hyper-parameter tuning. In this project a grid search approach is followed in the quest of the optimal CNN architecture. Additionally, regularization and augmentation techniques are implemented in an effort to further improve the models performance.
APA, Harvard, Vancouver, ISO, and other styles
22

Chen, Tairui. "Going Deeper with Convolutional Neural Network for Intelligent Transportation." Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-theses/144.

Full text
Abstract:
Over last several decades, computer vision researchers have been devoted to find good feature to solve different tasks, object recognition, object detection, object segmentation, activity recognition and so forth. Ideal features transform raw pixel intensity values to a representation in which these computer vision problems are easier to solve. Recently, deep feature from covolutional neural network(CNN) have attracted many researchers to solve many problems in computer vision. In the supervised setting, these hierarchies are trained to solve specific problems by minimizing an objective function for different tasks. More recently, the feature learned from large scale image dataset have been proved to be very effective and generic for many computer vision task. The feature learned from recognition task can be used in the object detection task. This work aims to uncover the principles that lead to these generic feature representations in the transfer learning, which does not need to train the dataset again but transfer the rich feature from CNN learned from ImageNet dataset. This work aims to uncover the principles that lead to these generic feature representations in the transfer learning, which does not need to train the dataset again but transfer the rich feature from CNN learned from ImageNet dataset. We begin by summarize some related prior works, particularly the paper in object recognition, object detection and segmentation. We introduce the deep feature to computer vision task in intelligent transportation system. First, we apply deep feature in object detection task, especially in vehicle detection task. Second, to make fully use of objectness proposals, we apply proposal generator on road marking detection and recognition task. Third, to fully understand the transportation situation, we introduce the deep feature into scene understanding in road. We experiment each task for different public datasets, and prove our framework is robust.
APA, Harvard, Vancouver, ISO, and other styles
23

Hossain, Md Tahmid. "Towards robust convolutional neural networks in challenging environments." Thesis, Federation University Australia, 2021. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/181882.

Full text
Abstract:
Image classification is one of the fundamental tasks in the field of computer vision. Although Artificial Neural Network (ANN) showed a lot of promise in this field, the lack of efficient computer hardware subdued its potential to a great extent. In the early 2000s, advances in hardware coupled with better network design saw the dramatic rise of Convolutional Neural Network (CNN). Deep CNNs pushed the State-of-The-Art (SOTA) in a number of vision tasks, including image classification, object detection, and segmentation. Presently, CNNs dominate these tasks. Although CNNs exhibit impressive classification performance on clean images, they are vulnerable to distortions, such as noise and blur. Fine-tuning a pre-trained CNN on mutually exclusive or a union set of distortions is a brute-force solution. This iterative fine-tuning process with all known types of distortion is, however, exhaustive and the network struggles to handle unseen distortions. CNNs are also vulnerable to image translation or shift, partly due to common Down-Sampling (DS) layers, e.g., max-pooling and strided convolution. These operations violate the Nyquist sampling rate and cause aliasing. The textbook solution is low-pass filtering (blurring) before down-sampling, which can benefit deep networks as well. Even so, non-linearity units, such as ReLU, often re-introduce the problem, suggesting that blurring alone may not suffice. Another important but under-explored issue for CNNs is unknown or Open Set Recognition (OSR). CNNs are commonly designed for closed set arrangements, where test instances only belong to some ‘Known Known’ (KK) classes used in training. As such, they predict a class label for a test sample based on the distribution of the KK classes. However, when used under the OSR setup (where an input may belong to an ‘Unknown Unknown’ or UU class), such a network will always classify a test instance as one of the KK classes even if it is from a UU class. Historically, CNNs have struggled with detecting objects in images with large difference in scale, especially small objects. This is because the DS layers inside a CNN often progressively wipe out the signal from small objects. As a result, the final layers are left with no signature from these objects leading to degraded performance. In this work, we propose solutions to the above four problems. First, we improve CNN robustness against distortion by proposing DCT based augmentation, adaptive regularisation, and noise suppressing Activation Functions (AF). Second, to ensure further performance gain and robustness to image transformations, we introduce anti-aliasing properties inside the AF and propose a novel DS method called blurpool. Third, to address the OSR problem, we propose a novel training paradigm that ensures detection of UU classes and accurate classification of the KK classes. Finally, we introduce a novel CNN that enables a deep detector to identify small objects with high precision and recall. We evaluate our methods on a number of benchmark datasets and demonstrate that they outperform contemporary methods in the respective problem set-ups.<br>Doctor of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
24

Sällqvist, Jessica. "Real-time 3D Semantic Segmentation of Timber Loads with Convolutional Neural Networks." Thesis, Linköpings universitet, Datorseende, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-148862.

Full text
Abstract:
Volume measurements of timber loads is done in conjunction with timber trade. When dealing with goods of major economic values such as these, it is important to achieve an impartial and fair assessment when determining price-based volumes. With the help of Saab’s missile targeting technology, CIND AB develops products for digital volume measurement of timber loads. Currently there is a system in operation that automatically reconstructs timber trucks in motion to create measurable images of them. Future iterations of the system is expected to fully automate the scaling by generating a volumetric representation of the timber and calculate its external gross volume. The first challenge towards this development is to separate the timber load from the truck. This thesis aims to evaluate and implement appropriate method for semantic pixel-wise segmentation of timber loads in real time. Image segmentation is a classic but difficult problem in computer vision. To achieve greater robustness, it is therefore important to carefully study and make use of the conditions given by the existing system. Variations in timber type, truck type and packing together create unique combinations that the system must be able to handle. The system must work around the clock in different weather conditions while maintaining high precision and performance.
APA, Harvard, Vancouver, ISO, and other styles
25

Wiklander, Marcus. "Classification of tree species from 3D point clouds using convolutional neural networks." Thesis, Umeå universitet, Institutionen för fysik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-174662.

Full text
Abstract:
In forest management, knowledge about a forest's distribution of tree species is key. Being able to automate tree species classification for large forest areas is of great interest, since it is tedious and costly labour doing it manually. In this project, the aim was to investigate the efficiency of classifying individual tree species (pine, spruce and deciduous forest) from 3D point clouds acquired by airborne laser scanning (ALS), using convolutional neural networks. Raw data consisted of 3D point clouds and photographic images of forests in northern Sweden, collected from a helicopter flying at low altitudes. The point cloud of each individual tree was connected to its representation in the photos, which allowed for manual labeling of training data to be used for training of convolutional neural networks. The training data consisted of labels and 2D projections created from the point clouds, represented as images. Two different convolutional neural networks were trained and tested; an adaptation of the LeNet architecture and the ResNet architecture. Both networks reached an accuracy close to 98 %, the LeNet adaptation having a slightly lower loss score for both validation and test data compared to that of ResNet. Confusion matrices for both networks showed similar F1 scores for all tree species, between 97 % and 98 %. The accuracies computed for both networks were found higher than those achieved in similar studies using ALS data to classify individual tree species. However, the results in this project were never tested against a true population sample to confirm the accuracy. To conclude, the use of convolutional neural networks is indeed an efficient method for classification of tree species, but further studies on unbiased data is needed to validate these results.
APA, Harvard, Vancouver, ISO, and other styles
26

Martell, Patrick Keith. "Hierarchical Auto-Associative Polynomial Convolutional Neural Networks." University of Dayton / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1513164029518038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Svensson, Göran, and Jonas Westlund. "Intravenous bag monitoring with Convolutional Neural Networks." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-148449.

Full text
Abstract:
Drip bags are used in hospital environments to administerdrugs and nutrition to patients. Ensuring that they are usedcorrectly and are refilled in time are important for the safetyof patients. This study examines the use of a ConvolutionalNeural Network (CNN) to monitor the fluid levels of drip bagsvia image recognition to potentially form the base of an earlywarning system, and assisting in making medical care moreefficient. Videos of drip bags were recorded as they wereemptying their contents in a controlled environment and fromdifferent angles. A CNN was built to analyze the recordeddata in order to predict a bags fluid level with a 5% intervalprecision from a given image. The results show that the CNNused performs poorly when monitoring fluid levels in dripbags.
APA, Harvard, Vancouver, ISO, and other styles
28

Khasgiwala, Anuj. "Word Recognition in Nutrition Labels with Convolutional Neural Network." DigitalCommons@USU, 2018. https://digitalcommons.usu.edu/etd/7101.

Full text
Abstract:
Nowadays, everyone is very busy and running around trying to maintain a balance between their work life and family, as the working hours are increasing day by day. In such hassled life people either ignore or do not give enough attention to a healthy diet. An imperative part of a healthy eating routine is the cognizance and maintenance of nourishing data and comprehension of how extraordinary sustenance and nutritious constituents influence our bodies. Besides in the USA, in many other countries, nutritional information is fundamentally passed on to consumers through nutrition labels (NLs) which can be found in all packaged food products in the form of nutrition table. However, sometimes it turns out to be challenging to utilize this information available in these NLs notwithstanding for consumers who are health conscious as they may not be familiar with nutritional terms and discover it hard to relate nutritional information into their day by day activities because of lack of time, inspiration, or training. So it is essential to automate this information gathering and interpretation procedure by incorporating Machine Learning based algorithm to abstract nutritional information from NLs on the grounds that it enhances the consumer’s capacity to participate in nonstop nutritional information gathering and analysis.
APA, Harvard, Vancouver, ISO, and other styles
29

Wang, Run Fen. "Semantic Text Matching Using Convolutional Neural Networks." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-362134.

Full text
Abstract:
Semantic text matching is a fundamental task for many applications in NaturalLanguage Processing (NLP). Traditional methods using term frequencyinversedocument frequency (TF-IDF) to match exact words in documentshave one strong drawback which is TF-IDF is unable to capture semanticrelations between closely-related words which will lead to a disappointingmatching result. Neural networks have recently been used for various applicationsin NLP, and achieved state-of-the-art performances on many tasks.Recurrent Neural Networks (RNN) have been tested on text classificationand text matching, but it did not gain any remarkable results, which is dueto RNNs working more effectively on texts with a short length, but longdocuments. In this paper, Convolutional Neural Networks (CNN) will beapplied to match texts in a semantic aspect. It uses word embedding representationsof two texts as inputs to the CNN construction to extract thesemantic features between the two texts and give a score as the output ofhow certain the CNN model is that they match. The results show that aftersome tuning of the parameters the CNN model could produce accuracy,prediction, recall and F1-scores all over 80%. This is a great improvementover the previous TF-IDF results and further improvements could be madeby using dynamic word vectors, better pre-processing of the data, generatelarger and more feature rich data sets and further tuning of the parameters.
APA, Harvard, Vancouver, ISO, and other styles
30

Reiling, Anthony J. "Convolutional Neural Network Optimization Using Genetic Algorithms." University of Dayton / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1512662981172387.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Andersson, Viktor. "Semantic Segmentation : Using Convolutional Neural Networks and Sparse dictionaries." Thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-139367.

Full text
Abstract:
The two main bottlenecks using deep neural networks are data dependency and training time. This thesis proposes a novel method for weight initialization of the convolutional layers in a convolutional neural network. This thesis introduces the usage of sparse dictionaries. A sparse dictionary optimized on domain specific data can be seen as a set of intelligent feature extracting filters. This thesis investigates the effect of using such filters as kernels in the convolutional layers in the neural network. How do they affect the training time and final performance? The dataset used here is the Cityscapes-dataset which is a library of 25000 labeled road scene images.The sparse dictionary was acquired using the K-SVD method. The filters were added to two different networks whose performance was tested individually. One of the architectures is much deeper than the other. The results have been presented for both networks. The results show that filter initialization is an important aspect which should be taken into consideration while training the deep networks for semantic segmentation.
APA, Harvard, Vancouver, ISO, and other styles
32

Andriolo, Stefano. "Convolutional Neural Networks in Tomographic Image Enhancement." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22843/.

Full text
Abstract:
Convolutional Neural Networks have seen a huge rise in popularity in image applications. They have been used in medical imaging contexts to enhance the overall quality of the digital representation of the patient's scanned body region and have been very useful when dealing with limited-angle tomographic data. In this thesis, a particular type of convolutional neural network called Unet will be used as the starting point to explore the effectiveness of different networks in enhancing tomographic image reconstructions. We will first make minor tweaks to the 2-dimensional convolutional network and train it on two different datasets. After that, we will take advantage of the shape of the reconstructions we are considering to extend the convolutions to the third dimension. The scanner layout that has been considered for projecting and reconstructing volumes in this thesis indeed consits of a cone-beam geometry, whose output is a volume that approximates the original scanned object. We will then discuss the results in order to try to understand if the proposed solutions could be viable approaches for enhancing tomographic images.
APA, Harvard, Vancouver, ISO, and other styles
33

Li, Xile. "Real-time Multi-face Tracking with Labels based on Convolutional Neural Networks." Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/36707.

Full text
Abstract:
This thesis presents a real-time multi-face tracking system, which is able to track multiple faces for live videos, broadcast, real-time conference recording, etc. The real-time output is one of the most significant advantages. Our proposed tracking system is comprised of three parts: face detection, feature extraction and tracking. We deploy a three-layer Convolutional Neural Network (CNN) to detect a face, a one-layer CNN to extract the features of a detected face and a shallow network for face tracking based on the extracted feature maps of the face. The performance of our multi-face tracking system enables the tracker to run in real-time without any on-line training. This algorithm does not need to change any parameters according to different input video conditions, and the runtime cost will not be affected significantly by an the increase in the number of faces being tracked. In addition, our proposed tracker can overcome most of the generally difficult tracking conditions which include video containing a camera cut, face occlusion, false positive face detection, false negative face detection, e.g. due to faces at the image boundary or faces shown in profile. We use two commonly used metrics to evaluate the performance of our multi-face tracking system demonstrating that our system achieves accurate results. Our multi-face tracker achieves an average runtime cost around 0.035s with GPU acceleration and this runtime cost is close to stable even if the number of tracked faces increases. All the evaluation results and comparisons are tested with four commonly used video data sets.
APA, Harvard, Vancouver, ISO, and other styles
34

Nikzad, Dehaji Mohammad. "Structural Improvements of Convolutional Neural Networks." Thesis, Griffith University, 2021. http://hdl.handle.net/10072/410448.

Full text
Abstract:
Over the last decade, deep learning has demonstrated outstanding performance in almost every application domain. Among different types of deep frameworks, convolutional neural networks (CNNs), inspired by the biological process of the visual system, can learn to extract discriminative features from raw inputs without any prior manipulation. However, efficient information circulation and the ability to explore effective new features are still two key and challenging factors for a successful deep neural network. In this thesis, we aim at presenting novel structural improvements of the CNN frameworks to enhance their effectiveness and efficiency of feature exploring and exploiting capability. To this end, first, we propose a novel residual-dense lattice network (RDL-Net), a 2-dimensional triangular lattice of convolutional units connected using residual and dense connections. RDL-Net effectively harnesses the advantages of both residual and dense aggregations without over-allocating parameters for feature re-usage. This property improves the network’s capacity to effectively and yet efficiently extract and exploit features. Furthermore, our extensive experimental investigation in processing 1D sequential speech signals shows that RDL-Nets can achieve a higher speech enhancement performance than many state-of-the-art CNN-based speech enhancement approaches. Further, we modify RDL topology to be applicable for the spatial (2D) signals. Hence, inspired by RDL-Nets innovation, we present an attention-based pyramid dilated lattice network (APDL-Net) for blind image denoising. The proposed framework employs a novel pyramid dilated convolution strategy alongside a channel-wise attention mechanism to effectively capture contextual information corresponding to different noise levels through the training of a single model. The extensive empirical studies in image denoising and JPEG artifacts suppression tasks verify the effectiveness and efficiency of the APDL architecture. We also investigate the capability of the lattice topology for hyperspectral image classification. For this purpose, we introduce a new attention-based lattice network (ALN) empowered by a unique joint spectral-spatial attention mechanism to capture spectral and spatial information effectively. The proposed ALN achieves superior accuracy and computational efficiency against state-of-the-art deep learning benchmark approaches for hyperspectral image classification. In addition to the above architectural improvements of CNNs, inspired by geographical analysis, we propose a novel channel-wise spatially autocorrelated (CSA) attention mechanism. The proposed CSA exploits the spatial relationships between feature maps channels. It also employs a unique hybrid spatial contiguity measure based on directional metrics to measure the degree of spatial closeness between feature maps effectively. Furthermore, imposing negligible learning parameters and light computational overhead to the deep model, making CSA a powerful yet efficient attention module of choice. The experimental results on large scale image classification and object detection datasets demonstrate that CSA-Nets can consistently achieve superior performance than different state-of-the-art attention-based CNNs. Besides the above architectural and attention-based advances, this research presents a simple and novel feature pooling method as gradient-based pooling (GP). This method considers the spatial gradient of the pixels within a pooling region as a key to pick the possible discriminative information. In contrast, other common pooling methods mostly rely on pixel values. The superiority of the GP over other pooling methods is proved through experiments on different benchmark image classification tasks.<br>Thesis (PhD Doctorate)<br>Doctor of Philosophy (PhD)<br>School of Eng & Built Env<br>Science, Environment, Engineering and Technology<br>Full Text
APA, Harvard, Vancouver, ISO, and other styles
35

Sure, Venkata Leela. "Enhanced Approach for the Classification of Ulcerative Colitis Severity in Colonoscopy Videos Using CNN." Thesis, University of North Texas, 2019. https://digital.library.unt.edu/ark:/67531/metadc1538703/.

Full text
Abstract:
Ulcerative colitis (UC) is a chronic inflammatory disease characterized by periods of relapses and remissions affecting more than 500,000 people in the United States. To achieve the therapeutic goals of UC, which are to first induce and then maintain disease remission, doctors need to evaluate the severity of UC of a patient. However, it is very difficult to evaluate the severity of UC objectively because of non-uniform nature of symptoms and large variations in their patterns. To address this, in our previous works, we developed two different approaches in which one is using the image textures, and the other is using CNN (convolutional neural network) to measure and classify objectively the severity of UC presented in optical colonoscopy video frames. But, we found that the image texture based approach could not handle larger number of variations in their patterns, and the CNN based approach could not achieve very high accuracy. In this paper, we improve our CNN based approach in two ways to provide better accuracy for the classification. We add more thorough and essential preprocessing, and generate more classes to accommodate large variations in their patterns. The experimental results show that the proposed preprocessing can improve the overall accuracy of evaluating the severity of UC.
APA, Harvard, Vancouver, ISO, and other styles
36

Singh, Vineeta. "Understanding convolutional networks and semantic similarity." University of Cincinnati / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1593269596368388.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

LEONARDI, MARCO. "Image Collection Management using Convolutional Neural Networks." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2022. http://hdl.handle.net/10281/365014.

Full text
Abstract:
Al giorno d’oggi ormai quasi chiunque possiede uno smartphone dotato di una telecamera ad alta risoluzione. Negli ultimi decenni, i contenuti multimediali (immagini e video) stanno sempre più spesso diventando il principale mezzo di comunicazione. Dato il continuo calo dei prezzi dei dispositivi di archiviazione, il numero totale di immagini salvate sta aumentando notevolmente, andando così a creare collezioni di immagini sempre più grandi, a tal punto da essere una problema per chi vuole le vuole esplorare. Data una libreria di immagini, il processo di selezione di un gruppo di foto che rappresenti al meglio le informazioni contenute in essa è condizionato dalle proprietà percettive delle immagini. Al fine di gestire in modo automatico questa selezione, in letteratura sono stati proposti diversi metodi che sfruttano le proprietà percettive delle immagini. Questi metodi hanno come scopo quello di associare alle immagini un valore numerico che ne rappresenta la presenza o meno di queste proprietà, come ad esempio la qualità, l'estetica o la memorabilità, per poi sfruttarle per selezionare le immagini migliori. La presente tesi comincia trattando quelle che sono le proprietà delle immagini fondamentali al processo di selezione delle immagini, rispettivamente la qualità e l’estetica delle immagini. Per prima cosa viene studiata la qualità delle immagini mediante un processo di rilevamento delle anomalie. Questo perché dal punto di vista di un sistema automatico di selezione delle immagini, è più indicato un metodo che sia in grado di distinguere le immagini belle da quelle brutte, piuttosto che un metodo che predica un valore ben correlato con la qualità delle immagini. Successivamente l’attenzione viene spostata sul problema della valutazione automatica dell’estetica delle immagini. In particolare viene prima proposto un metodo in grado di stimare il grado di bellezza di un'immagine a partire dalla predizione di attributi correlati all’estetica. Successivamente introducendo un metodo per la valutazione automatica dell’estetica fondato su molteplici aspetti delle immagini quali il contenuto semantico, lo stile artistico e lo stile di composizione. Uno dei tanti motivi per cui si scatta una foto è quello di poter essere poi in grado di rivivere il momento impresso semplicemente riguardando la foto. Le immagini possono essere pertanto viste come un collegamento concreto tra i propri ricordi e gli eventi passati. La memorabilità delle immagini è pertanto una proprietà fondamentale nell’organizzazione delle immagini. Essere in grado di riconoscere queste immagini memorabili, significa poter favorire la loro selezione. Per questo motivo nella suddetta tesi viene presentato un metodo capace di stimare la memorabilità delle immagini. In particolare la soluzione proposta va nella direzione di predire la memorabilità delle immagini scomponendo le proprietà intrinseche delle immagini che influenzano la memorabilità. Per finire, considerando che le collezioni di immagini tendono spesso ad avere molteplici foto simili tra loro. Al fine di garantire una selezione di immagini il più diversa e rappresentativa possibile, viene proposto un metodo flessibile ed innovativo per riassumere automaticamente le collezioni di immagini. A tal proposito, il metodo introdotto è stato progettato considerando diversi aspetti delle immagini tra cui la categoria della scena, la qualità e l'estetica.<br>Almost everyone carries a high-quality camera in their smartphone and uses it to communicate with other individuals and for the last two decades, people are increasingly making use of images and videos in their transportable communication. As the prices of the storage are decreasing, the number of photos stored is increasing, leading to collections of images whose sizes begin to be a barrier for relieving the captured moments and exploring them. We are submerged by images. In order to ease the problem of oversized image collections, methods that aim to select a subset of photos that best represents them have been designed and proposed in the literature. Those methods typically rely upon the prediction of perceptual features such as, for example, the image quality, aesthetics, and memorability, to select the best images. This thesis starts from the fundamental image properties that guide the image selection, respectively the image quality and image aesthetics. First, the perceived image quality assessment is investigated in an anomaly detection manner, contrary to the most common regression task. This is because rather than predict a score that best correlates to the average human opinion, being able to distinguish good quality images from bad ones, is more suitable for the image collection management problem, furthermore, it requires fewer images to tune the model. Then the problem of automatic assessment of image aesthetics is introduced. In the beginning, presenting a method that learns the aesthetics of a picture on the basis of the prediction of aesthetics-related attributes. Then, a new solution that takes into account the semantic content, the artistic style, and the composition of the image is presented. One of the reasons people take photos is to capture important situations to recall them later on, usually with the intention of afterwards sharing their photos with other people like friends or family members. Photos can be seen as a concrete link between our memories and experienced events. Image memorability can be helpful in the organization of the selected images to better bind the memory of experienced events and the taken images. To this end in this thesis, a method for the estimation of still image memorability is presented. In particular, the proposed method goes in the direction of breaking down the intrinsic image properties that influence the memorability of the pictures. Image collections tend to have several similar images. This is because to ensure the best shot, people usually take a series of photos of the same scene. To guarantee a diverse and representative selection of images from a large collection, this thesis concludes by proposing a flexible and innovative framework that can be used to both explore large-scale image datasets and to summarize photo albums. The proposed method is designed to exploit different aspects of the images, such as the scene category, image quality, and image aesthetics.
APA, Harvard, Vancouver, ISO, and other styles
38

Buratti, Luca. "Visualisation of Convolutional Neural Networks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text
Abstract:
Le Reti Neurali, e in particolare le Reti Neurali Convoluzionali, hanno recentemente dimostrato risultati straordinari in vari campi. Purtroppo, comunque, non vi è ancora una chiara comprensione del perchè queste architetture funzionino così bene e soprattutto è difficile spiegare il comportamento nel caso di fallimenti. Questa mancanza di chiarezza è quello che separa questi modelli dall’essere applicati in scenari concreti e critici della vita reale, come la sanità o le auto a guida autonoma. Per questa ragione, durante gli ultimi anni sono stati portati avanti diversi studi in modo tale da creare metodi che siano capaci di spiegare al meglio cosa sta succedendo dentro una rete neurale oppure dove la rete sta guardando per predire in un certo modo. Proprio queste tecniche sono il centro di questa tesi e il ponte tra i due casi di studio che sono presentati sotto. Lo scopo di questo lavoro è quindi duplice: per prima cosa, usare questi metodi per analizzare e quindi capire come migliorare applicazioni basate su reti neurali convoluzionali e in secondo luogo, per investigare la capacità di generalizzazione di queste architetture, sempre grazie a questi metodi.
APA, Harvard, Vancouver, ISO, and other styles
39

Moukari, Michel. "Estimation de profondeur à partir d'images monoculaires par apprentissage profond." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC211/document.

Full text
Abstract:
La vision par ordinateur est une branche de l'intelligence artificielle dont le but est de permettre à une machine d'analyser, de traiter et de comprendre le contenu d'images numériques. La compréhension de scène en particulier est un enjeu majeur en vision par ordinateur. Elle passe par une caractérisation à la fois sémantique et structurelle de l'image, permettant d'une part d'en décrire le contenu et, d'autre part, d'en comprendre la géométrie. Cependant tandis que l'espace réel est de nature tridimensionnelle, l'image qui le représente, elle, est bidimensionnelle. Une partie de l'information 3D est donc perdue lors du processus de formation de l'image et il est d'autant plus complexe de décrire la géométrie d'une scène à partir d'images 2D de celle-ci.Il existe plusieurs manières de retrouver l'information de profondeur perdue lors de la formation de l'image. Dans cette thèse nous nous intéressons à l’estimation d'une carte de profondeur étant donné une seule image de la scène. Dans ce cas, l'information de profondeur correspond, pour chaque pixel, à la distance entre la caméra et l'objet représenté en ce pixel. L'estimation automatique d'une carte de distances de la scène à partir d'une image est en effet une brique algorithmique critique dans de très nombreux domaines, en particulier celui des véhicules autonomes (détection d’obstacles, aide à la navigation).Bien que le problème de l'estimation de profondeur à partir d'une seule image soit un problème difficile et intrinsèquement mal posé, nous savons que l'Homme peut apprécier les distances avec un seul œil. Cette capacité n'est pas innée mais acquise et elle est possible en grande partie grâce à l'identification d'indices reflétant la connaissance a priori des objets qui nous entourent. Par ailleurs, nous savons que des algorithmes d'apprentissage peuvent extraire ces indices directement depuis des images. Nous nous intéressons en particulier aux méthodes d’apprentissage statistique basées sur des réseaux de neurones profond qui ont récemment permis des percées majeures dans de nombreux domaines et nous étudions le cas de l'estimation de profondeur monoculaire<br>Computer vision is a branch of artificial intelligence whose purpose is to enable a machine to analyze, process and understand the content of digital images. Scene understanding in particular is a major issue in computer vision. It goes through a semantic and structural characterization of the image, on one hand to describe its content and, on the other hand, to understand its geometry. However, while the real space is three-dimensional, the image representing it is two-dimensional. Part of the 3D information is thus lost during the process of image formation and it is therefore non trivial to describe the geometry of a scene from 2D images of it.There are several ways to retrieve the depth information lost in the image. In this thesis we are interested in estimating a depth map given a single image of the scene. In this case, the depth information corresponds, for each pixel, to the distance between the camera and the object represented in this pixel. The automatic estimation of a distance map of the scene from an image is indeed a critical algorithmic brick in a very large number of domains, in particular that of autonomous vehicles (obstacle detection, navigation aids).Although the problem of estimating depth from a single image is a difficult and inherently ill-posed problem, we know that humans can appreciate distances with one eye. This capacity is not innate but acquired and made possible mostly thanks to the identification of indices reflecting the prior knowledge of the surrounding objects. Moreover, we know that learning algorithms can extract these clues directly from images. We are particularly interested in statistical learning methods based on deep neural networks that have recently led to major breakthroughs in many fields and we are studying the case of the monocular depth estimation
APA, Harvard, Vancouver, ISO, and other styles
40

Carpani, Valerio. "CNN-based video analytics." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text
Abstract:
The content of this thesis illustrates the six months work done during my internship at TKH Security Solutions - Siqura B.V. in Gouda, Netherlands. The aim of this thesis is to investigate on convolutional neural networks possible usage, from two different point of view: first we propose a novel algorithm for person re-identification, second we propose a deployment chain, for bringing research concepts to product ready solutions. In existing works, the person re-identification task is assumed to be independent of the person detection task. In this thesis instead, we consider the two tasks as linked. In fact, features produced by an object detection convolutional neural network (CNN) contain useful information, which is not being used by current re-identification methods. We propose several solutions for learning a metric on CNN features to distinguish between different identities. Then the best of these solutions is compared with state of the art alternatives on the popular Market-1501 dataset. Results show that our method outperforms them in computational efficiency, with only a reasonable loss in accuracy. For this reason, we believe that the proposed method can be more appropriate than current state of the art methods in situations where the computational efficiency is critical, such as embedded applications. The deployment chain we propose in this thesis has two main goals: it must be flexible for introducing new advancement in networks architecture, and it must be able to deploy neural networks both on server and embedded platforms. We tested several frameworks on several platforms and we ended up with a deployment chain that relies on the open source format ONNX.
APA, Harvard, Vancouver, ISO, and other styles
41

Saxena, Shreyas. "Apprentissage de représentations pour la reconnaissance visuelle." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM080/document.

Full text
Abstract:
Dans cette dissertation, nous proposons des méthodes d’apprentissage automa-tique aptes à bénéficier de la récente explosion des volumes de données digitales.Premièrement nous considérons l’amélioration de l’efficacité des méthodes derécupération d’image. Nous proposons une approche d’apprentissage de métriques locales coordonnées (Coordinated Local Metric Learning, CLML) qui apprends des métriques locales de Mahalanobis, puis les intègre dans une représentation globale où la distance l2 peut être utilisée. Ceci permet de visualiser les données avec une unique représentation 2D, et l’utilisation de méthodes de récupération efficaces basées sur la distance l2. Notre approche peut être interprétée comme l’apprentissage d’une projection linéaire de descripteurs donnés par une méthode a noyaux de grande dimension définie explictement. Cette interprétation permet d’appliquer des outils existants pour l’apprentissage de métriques de Mahalanobis à l’apprentissage de métriques locales coordonnées. Nos expériences montrent que la CLML amé-liore les résultats en matière de récupération de visage obtenues par les approches classiques d’apprentissage de métriques locales et globales.Deuxièmement, nous présentons une approche exploitant les modèles de ré-seaux neuronaux convolutionnels (CNN) pour la reconnaissance faciale dans lespectre visible. L’objectif est l’amélioration de la reconnaissance faciale hétérogène, c’est à dire la reconnaissance faciale à partir d’images infra-rouges avec des images d’entraînement dans le spectre visible. Nous explorerons différentes stratégies d’apprentissage de métriques locales à partir des couches intermédiaires d’un CNN, afin de faire le rapprochement entre des images de sources différentes. Dans nos expériences, la profondeur de la couche optimale pour une tâche donnée est positivement corrélée avec le changement entre le domaine source (données d’entraînement du CNN) et le domaine cible. Les résultats montrent que nous pouvons utiliser des CNN entraînés sur des images du spectre visible pour obtenir des résultats meilleurs que l’état de l’art pour la reconnaissance faciale hétérogène (images et dessins quasi-infrarouges).Troisièmement, nous présentons les "tissus de neurones convolutionnels" (Convolutional Neural Fabrics) permettant l’exploration de l’espace discret et exponentiellement large des architectures possibles de réseaux neuronaux, de manière efficiente et systématique. Au lieu de chercher à sélectionner une seule architecture optimale, nous proposons d’utiliser un "tissu" d’architectures combinant un nombre exponentiel d’architectures en une seule. Le tissu est une représentation 3D connectant les sorties de CNNs à différentes couches, échelles et canaux avec un motif de connectivité locale, homogène et creux. Les seuls hyper-paramètres du tissu (le nombre de canaux et de couches) ne sont pas critiques pour la performance. La nature acyclique du tissu nous permet d’utiliser la rétro-propagation du gradient durant la phase d’apprentissage. De manière automatique, nous pouvons donc configurer le tissu de manière à implémenter l’ensemble de toutes les architectures possibles (un nombre exponentiel) et, plus généralement, des ensembles (combinaisons) de ces modèles. La complexité de calcul et de taille mémoire du tissu évoluent de manière linéaire alors qu’il permet d’exploiter un nombre exponentiel d’architectures en parallèle, en partageant les paramètres entre architectures. Nous présentons des résultats à l’état de l’art pour la classification d’images sur le jeu de données MNIST et CIFAR10, et pour la segmentation sémantique sur le jeu de données Part Labels<br>In this dissertation, we propose methods and data driven machine learning solutions which address and benefit from the recent overwhelming growth of digital media content.First, we consider the problem of improving the efficiency of image retrieval. We propose a coordinated local metric learning (CLML) approach which learns local Mahalanobis metrics, and integrates them in a global representation where the l2 distance can be used. This allows for data visualization in a single view, and use of efficient ` 2 -based retrieval methods. Our approach can be interpreted as learning a linear projection on top of an explicit high-dimensional embedding of a kernel. This interpretation allows for the use of existing frameworks for Mahalanobis metric learning for learning local metrics in a coordinated manner. Our experiments show that CLML improves over previous global and local metric learning approaches for the task of face retrieval.Second, we present an approach to leverage the success of CNN models forvisible spectrum face recognition to improve heterogeneous face recognition, e.g., recognition of near-infrared images from visible spectrum training images. We explore different metric learning strategies over features from the intermediate layers of the networks, to reduce the discrepancies between the different modalities. In our experiments we found that the depth of the optimal features for a given modality, is positively correlated with the domain shift between the source domain (CNN training data) and the target domain. Experimental results show the that we can use CNNs trained on visible spectrum images to obtain results that improve over the state-of-the art for heterogeneous face recognition with near-infrared images and sketches.Third, we present convolutional neural fabrics for exploring the discrete andexponentially large CNN architecture space in an efficient and systematic manner. Instead of aiming to select a single optimal architecture, we propose a “fabric” that embeds an exponentially large number of architectures. The fabric consists of a 3D trellis that connects response maps at different layers, scales, and channels with a sparse homogeneous local connectivity pattern. The only hyperparameters of the fabric (the number of channels and layers) are not critical for performance. The acyclic nature of the fabric allows us to use backpropagation for learning. Learning can thus efficiently configure the fabric to implement each one of exponentially many architectures and, more generally, ensembles of all of them. While scaling linearly in terms of computation and memory requirements, the fabric leverages exponentially many chain-structured architectures in parallel by massively sharing weights between them. We present benchmark results competitive with the state of the art for image classification on MNIST and CIFAR10, and for semantic segmentation on the Part Labels dataset
APA, Harvard, Vancouver, ISO, and other styles
42

Ahlin, Björn, and Marcus Gärdin. "Automated Classification of Steel Samples : An investigation using Convolutional Neural Networks." Thesis, KTH, Materialvetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209669.

Full text
Abstract:
Automated image recognition software has earlier been used for various analyses in the steel making industry. In this study, the possibility to apply such software to classify Scanning Electron Microscope (SEM) images of two steel samples was investigated. The two steel samples were of the same steel grade but with the difference that they had been treated with calcium for a different length of time.  To enable automated image recognition, a Convolutional Neural Network (CNN) was built. The construction of the software was performed with open source code provided by Keras Documentation, thus ensuring an easily reproducible program. The network was trained, validated and tested, first for non-binarized images and then with binarized images. Binarized images were used to ensure that the network's prediction only considers the inclusion information and not the substrate. The non-binarized images gave a classification accuracy of 99.99 %. For the binarized images, the classification accuracy obtained was 67.9%.  The results show that it is possible to classify steel samples using CNNs. One interesting aspect of the success in classifying steel samples is that further studies on CNNs could enable automated classification of inclusions.<br>Automatiserad bildigenkänning har tidigare använts inom ståltillverkning för olika sorters analyser. Den här studiens syfte är att undersöka om bildigenkänningsprogram applicerat på Svepelektronmikroskopi (SEM) bilder kan klassificera två stålprover. Stålproven var av samma sort, med skillnaden att de behandlats med kalcium olika lång tid. För att möjliggöra den automatiserade bildigenkänningen byggdes ett Convolutional Neural Network (CNN). Nätverket byggdes med hjälp av öppen kod från Keras Documentation. Detta för att programmet enkelt skall kunna reproduceras. Nätverket tränades, validerades och testades, först för vanliga bilder och sedan för binariserade bilder. Binariserade bilder användes för att tvinga programmet att bara klassificera med avseende på inneslutningar och inte på grundmatrisen. Resultaten på klassificeringen för vanliga bilder gav en träffsäkerhet på 99.99%. För binariserade bilder blev träffsäkerheten för klassificeringen 67.9%. Resultaten visar att det är möjligt att använda CNNs för att klassificera stålprover. En intressant möjlighet som vidare studier på CNNs kan leda till är att automatisk klassificering av inneslutningar kan möjliggöras.
APA, Harvard, Vancouver, ISO, and other styles
43

Li, Chao. "WELD PENETRATION IDENTIFICATION BASED ON CONVOLUTIONAL NEURAL NETWORK." UKnowledge, 2019. https://uknowledge.uky.edu/ece_etds/133.

Full text
Abstract:
Weld joint penetration determination is the key factor in welding process control area. Not only has it directly affected the weld joint mechanical properties, like fatigue for example. It also requires much of human intelligence, which either complex modeling or rich of welding experience. Therefore, weld penetration status identification has become the obstacle for intelligent welding system. In this dissertation, an innovative method has been proposed to detect the weld joint penetration status using machine-learning algorithms. A GTAW welding system is firstly built. Project a dot-structured laser pattern onto the weld pool surface during welding process, the reflected laser pattern is captured which contains all the information about the penetration status. An experienced welder is able to determine weld penetration status just based on the reflected laser pattern. However, it is difficult to characterize the images to extract key information that used to determine penetration status. To overcome the challenges in finding right features and accurately processing images to extract key features using conventional machine vision algorithms, we propose using convolutional neural network (CNN) to automatically extract key features and determine penetration status. Data-label pairs are needed to train a CNN. Therefore, an image acquiring system is designed to collect reflected laser pattern and the image of work-piece backside. Data augmentation is performed to enlarge the training data size, which resulting in 270,000 training data, 45,000 validation data and 45,000 test data. A six-layer convolutional neural network (CNN) has been designed and trained using a revised mini-batch gradient descent optimizer. Final test accuracy is 90.7% and using a voting mechanism based on three consequent images further improve the prediction accuracy.
APA, Harvard, Vancouver, ISO, and other styles
44

Capuzzo, Davide. "3D StixelNet Deep Neural Network for 3D object detection stixel-based." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/22017/.

Full text
Abstract:
In this thesis it has been presented an algorithm of deep learning for 3D object detection from the point cloud in an outdoor environment. This algorithm is feed with stixel, a medium-type data generates starting from a point cloud or depth map. A stixel can be think as a small rectangle that start form the base of the road and then rises until the top of the obstacle summarizing the vertical surface of an object. The goal of stixel is to compress the data coming from sensors in order to have a fast transmission without losing information. The algorithm to generate stixel is a novel algorithm developed by myself that is able to be applied both from point cloud generated by lidar and also from depth map generated by stereo and mono camera. The main passage to create this type of data are: the elimination of points that lied on ground plane; the creation of an average matrix that summarizes the depth of group of stixel; the creation of stixel merging all the cells that are of the same object. The stixel generates reduce the points from 40 000 to 1200 for LIDAR point cloud and to 480 000 to 1200 for depth map. In order to extract 3D information from stixel this data has been feed into a deep learning algorithm adapted to receive as input this type of data. The adaptation has been made starting from an existing neural network use for 3D object detection in an indoor environment. This network has been adapted in order to overcome the sparsity of data and to the big size of the scene. Despite the reduction of the number of data, thanks to the right tuning the network created in this thesis have been able to achieve the state of the art for 3D object detection. This is a relevant result because it opens the way to the use of medium-type data and underlines that the reduction of points does not mean a reduction of information if the data are compressed in a smart way. oints not means a reduction of information if the data are compressed in a smart way.
APA, Harvard, Vancouver, ISO, and other styles
45

Habrman, David. "Face Recognition with Preprocessing and Neural Networks." Thesis, Linköpings universitet, Datorseende, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-128704.

Full text
Abstract:
Face recognition is the problem of identifying individuals in images. This thesis evaluates two methods used to determine if pairs of face images belong to the same individual or not. The first method is a combination of principal component analysis and a neural network and the second method is based on state-of-the-art convolutional neural networks. They are trained and evaluated using two different data sets. The first set contains many images with large variations in, for example, illumination and facial expression. The second consists of fewer images with small variations. Principal component analysis allowed the use of smaller networks. The largest network has 1.7 million parameters compared to the 7 million used in the convolutional network. The use of smaller networks lowered the training time and evaluation time significantly. Principal component analysis proved to be well suited for the data set with small variations outperforming the convolutional network which need larger data sets to avoid overfitting. The reduction in data dimensionality, however, led to difficulties classifying the data set with large variations. The generous amount of images in this set allowed the convolutional method to reach higher accuracies than the principal component method.
APA, Harvard, Vancouver, ISO, and other styles
46

Stigeborn, Patrik. "Generating 3D-objects using neural networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230668.

Full text
Abstract:
Enabling a 2D- to 3D-reconstruction is an interesting future service for Mutate AB, where this thesis is conducted. Convolutional neural networks (CNNs) is examined in different aspects, in order to give a realistic perception of what this technology is capable of. The task conducted, is the creation of a CNN that can be used to predict how an object from a 2D image would look in 3D. The main areas that this CNN is optimized for are Quality, Speed, and Simplicity. Where Quality is the output resolution of the 3D object, Speed is measured by the number of seconds it takes to complete a reconstruction, and Simplicity is achieved by using machine learning (ML). Enabling this could potentially ease the creation of 3D games and make the development faster. The chosen solution is to use two CNNs. The first CNN is using convolution to extract features from an input image. The second CNN is using transpose convolution to create a prediction of how the object would look in 3D, from the features extracted by the first neural network. This thesis is using an empirical development approach to reach an optimal solution for the CNN structure and its hyperparameters. The 3D-reconstruction is inspired by a sculpting process, meaning that the reconstruction starts with a low resolution and improves it iteratively. The result shows that the quality gained from each iteration grows exponentially whilst the increased time grows a lot less. Thereof, the conclusion is that the trade-off between speed and quality is in our favor. However, when looking at commercializing this technology or deploy it in a professional environment, it is still too slow to generate high resolution output. Also, in this case, the CNN is fragile when there are a lot of unrecognized shapes in the input image.
APA, Harvard, Vancouver, ISO, and other styles
47

Norén, Gustav. "Noise Robustness of Convolutional Autoencoders and Neural Networks for LPI Radar Classification." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273604.

Full text
Abstract:
This study evaluates noise robustness of convolutional autoencoders and neural networks for classification of Low Probability of Intercept (LPI) radar modulation type. Specifically, a number of different neural network architectures are tested in four different synthetic noise environments. Tests in Gaussian noise show that performance is decreasing with decreasing Signal to Noise Ratio (SNR). Training a network on all SNRs in the dataset achieved a peak performance of 70.8 % at SNR=-6 dB with a denoising autoencoder and convolutional classifier setup. Tests indicate that the models have a difficult time generalizing to SNRs lower than what is provided in training data, performing roughly 10-20% worse than when those SNRs are included in the training data. If intermediate SNRs are removed from the training data the models can generalize and perform similarly to tests where, intermediate noise levels are included in the training data. When testing data is generated with different parameters to training data performance is underwhelming, with a peak performance of 22.0 % at SNR=-6 dB. The last tests done use telecom signals as additive noise instead of Gaussian noise. These tests are performed when the LPI and telecom signals appear at different frequencies. The models preform well on such cases with a peak performance of 80.3 % at an intermidiate noise level. This study also contribute with a different, and more realistic, way of generating data than what is prevalent in literature as well as a network that performs well without the need for signal preprocessing. Without preprocessing a peak performance of 64.9 % was achieved at SNR=-6 dB. It is customary to generate data such that each sample always includes the start of its signals period which increases performance by around 20 % across all tests. In a real application however it is not certain that the start of a received signal can be determined.<br>Detta arbete studerar brustålighet hos neurala nätverk för klassificering av \textit{låg sannolikhet för avlyssning} (LPI) radars modulationstyp. Specifikt testas ett antal arkitekturer baserade på faltningsnätverk och evalueras i fyra olika syntetiska brusmiljöer. Tester genomförda på data med Gaussiskt brus visar att klasificeringsfelet är ökande med ett minskande signal-till-brusförhållande. Om man låter nätverken träna på alla brusnivåer som ingår i datan uppnås en högsta pricksäkerhet om 70.8 % vid ett signal-till-brusförhållande på -6 dB. Vidare tester tyder på att nätverken presterar sämre på låga signal-till-brusförhållanden om de inte finns med i träningsdata och ger i allmänhet mellan 10-20 % sämre pricksäkerhet. Om de mellersta brusnivåerna inte finns med i träningsdata presterar nätverken lika bra som när de finns med i träningsdata. Om träningsdata och testdata genereras med olika parameterar presterar nätverken dåligt. För dessa tester uppnås en högsta pricksäkerhet om 22.0 % vid ett signal-till-brusförhållande på -6 dB. Den sista brusmiljön som testades på använder sig av telekom signaler som om de vore brus istället för Gaussiskt brus. I detta fall är LPI och telekom signalerna väl skiljda i frekvens och nätverken presterar lika bra som tester i Gaussiskt brus med högt signal-till-brusförhållande. Högsta pricksäkerhet som uppnåts på dessa tester är 80.3 % i mellanhög brusnivå. Detta arbete bidrar även med nätverk som presterar bra utan att data behöver signalbehandlas innnan den kan klassificeras samt genererar data på ett mer realistiskt vis än tidigare litteratur inom detta område. Utan att signalbehandla data uppnåddes en högsta pricksäkerhet om 64.9 % vid ett signal-till-brusförhållande på -6 dB. Den mer realistiska datan genereras så att dess startpunkt är slumpmässig. I litteraturen brukar startpunkten inkluderas och uppnår på så vis överlag pricksäkerheter som är ungefär 20 % högre än de tester som genomförs i detta arbete. I verkliga applikationer är det sällan man kan identifera en signals startpunkt med säkerhet.
APA, Harvard, Vancouver, ISO, and other styles
48

Tunell, John. "Classification of offensive game-emblem drawings using CNN (convolutional neural networks) and transfer learning." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-348944.

Full text
Abstract:
Convolutional neural networks (CNN) has become an important tool to solve many computer vision tasks of today. The technique is though costly, and training a network from scratch requires both a large dataset and adequate hardware. A solution to these shortcomings is to instead use a pre-trained network, an approach called transfer learning. Several studies have shown promising results applying transfer learning, but the technique requires further studies. This thesis explores the capabilities of transfer learning when applied to the task of filtering out offensive cartoon drawings in the game of Battlefield 1. GoogLeNet was pre-trained on ImageNet, and then the last layers were fine-tuned towards the target task and domain. The model achieved an accuracy of 96.71% when evaluated on the binary classification task of predicting non-offensive or swastika/penis content in Battlefield "emblems". The results indicate that a CNN trained on ImageNet is applicable, even when the target domain is very different from the pre-trained networks domain.
APA, Harvard, Vancouver, ISO, and other styles
49

Khlif, Wafa. "Multi-lingual scene text detection based on convolutional neural networks." Thesis, La Rochelle, 2022. http://www.theses.fr/2022LAROS022.

Full text
Abstract:
Cette thèse propose des approches de détection de texte par des techniques d'apprentissage profond pour explorer et récupérer des contenus faiblement structurés dans des images de scène naturelles. Ces travaux proposent, dans un premier temps, une méthode de détection de texte dans des images de scène naturelle basée sur une analyse multi-niveaux des composantes connexes (CC) et l'apprentissage des caractéristiques du texte par un réseau de neurones convolutionnel (CNN), suivie d'un regroupement des zones de texte détectées par une méthode à base de graphes. Les caractéristiques des composantes texte brut/non-texte obtenues à différents niveaux de granularité sont apprises via un CNN. Une deuxième méthode est présentée dans cette thèse inspirée du système YOLO. Le système réalise la détection du texte et l'identification du script simultanément. Nous considérons la tâche de détection de texte multi script comme un problème de détection d'objets, où l'objet est le script du texte. La détection de texte et l'identification des scripts sont réalisées avec une approche holistique en utilisant un réseau neuronal convolutionnel unique. Les évaluations expérimentales de ces approches sont réalisées sur le jeu de données MLT (Multi-Lingual Text dataset), nous avons contribué à la création de ce nouveau jeu de données. Il est composé d'images de scènes naturelles et synthétiques contenant du texte, tels que des panneaux de circulation et publicitaires, des noms de magasins, d'images extraites des réseaux sociaux. Ce type d'images représente l'un des types d'images les plus fréquemment rencontrés sur Internet, à savoir les images avec du texte incorporé dans les réseaux sociaux<br>This dissertation explores text detection approaches via deep learning techniques towards achieving the goal of mining and retrieval of weakly structured contents in scene images. First, this dissertation presents a method for detecting text in scene images based on multi-level connected component (CC) analysis and learning text component features via convolutional neural networks (CNN), followed by a graph-based grouping of overlapping text boxes. The features of the resulting raw text/non-text components of different granularity levels are learned via a CNN. The second contribution is inspired from YOLO: Real-Time Object Detection system. Both methods perform text detection and script identification simultaneously. The system presents a joint text detection and script identification approach based on casting the multi-script text detection task as an object detection problem, where the object is the script of the text. Joint text detection and script identification strategy is realized in a holistic approach using a single convolutional neural network where the input data is the full image and the outputs are the text bounding boxes and their script. Textual feature extraction and script classification are performed jointly via a CNN. The experimental evaluation of these methods are performed on the Multi-Lingual Text MLT dataset. We contributed in building this new dataset. It is constituted of natural scene images with embedded text, such as street signs and advertisement boards, passing vehicles, user photos in microblog. This kind of images represents one of the mostly encountered image types on the internet which are the images with embedded text in social media
APA, Harvard, Vancouver, ISO, and other styles
50

Strömberg, Lucas. "Optimizing Convolutional Neural Networks for Inference on Embedded Systems." Thesis, Uppsala universitet, Signaler och system, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-444802.

Full text
Abstract:
Convolutional neural networks (CNN) are state of the art machine learning models used for various computer vision problems, such as image recognition. As these networks normally need a vast amount of parameters they can be computationally expensive, which complicates deployment on embedded hardware, especially if there are contraints on for instance latency, memory or power consumption. This thesis examines the CNN optimization methods pruning and quantization, in order to explore how they affect not only model accuracy, but also possible inference latency speedup. Four baseline CNN models, based on popular and relevant architectures, were implemented and trained on the CIFAR-10 dataset. The networks were then quantized or pruned for various optimization parameters. All models can be successfully quantized to both 5-bit weights and activations, or pruned with 70% sparsity without any substantial effect on accuracy. The larger baseline models are generally more robust and can be quantized more aggressively, however they are also more sensitive to low-bit activations. Moreover, for 8-bit integer quantization the networks were implemented on an ARM Cortex-A72 microprocessor, where inference latency was studied. These fixed-point models achieves up to 5.5x inference speedup on the ARM processor, compared to the 32-bit floating-point baselines. The larger models gain more speedup from quantization than the smaller ones. While the results are not necessarily generalizable to different CNN architectures or datasets, the valuable insights obtained in this thesis can be used as starting points for further investigations in model optimization and possible effects on accuracy and embedded inference latency.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!