Log in

Relevant bibliographies by topics / LSTM-CNN / Dissertations / Theses

To see the other types of publications on this topic, follow the link: LSTM-CNN.

Dissertations / Theses on the topic 'LSTM-CNN'

Author: Grafiati

Published: 4 June 2021

Last updated: 4 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 38 dissertations / theses for your research on the topic 'LSTM-CNN.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Gessle, Gabriel, and Simon Åkesson. "A comparative analysis of CNN and LSTM for music genre classification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260138.

Full text

Abstract:

The music industry has seen a great influx of new channels to browse and distribute music. This does not come without drawbacks. As the data rapidly increases, manual curation becomes a much more difficult task. Audio files have a plethora of features that could be used to make parts of this process a lot easier. It is possible to extract these features, but the best way to handle these for different tasks is not always known. This thesis compares the two deep learning models, convolutional neural network (CNN) and long short-term memory (LSTM), for music genre classification when trained using mel-frequency cepstral coefficients (MFCCs) in hopes of making audio data as useful as possible for future usage. These models were tested on two different datasets, GTZAN and FMA, and the results show that the CNN had a 56.0% and 50.5% prediction accuracy, respectively. This outperformed the LSTM model that instead achieved a 42.0% and 33.5% prediction accuracy.
Musikindustrin har sett en stor ökning i antalet sätt att hitta och distribuera musik. Det kommer däremot med sina nackdelar, då mängden data ökar fort så blir det svårare att hantera den på ett bra sätt. Ljudfiler har mängder av information man kan extrahera och därmed göra den här processen enklare. Det är möjligt att använda sig av de olika typer av information som finns i filen, men bästa sättet att hantera dessa är inte alltid känt. Den här rapporten jämför två olika djupinlärningsmetoder, convolutional neural network (CNN) och long short-term memory (LSTM), tränade med mel-frequency cepstral coefficients (MFCCs) för klassificering av musikgenre i hopp om att göra ljuddata lättare att hantera inför framtida användning. Modellerna testades på två olika dataset, GTZAN och FMA, där resultaten visade att CNN:et fick en träffsäkerhet på 56.0% och 50.5% tränat på respektive dataset. Denna utpresterade LSTM modellen som istället uppnådde en träffsäkerhet på 42.0% och 33.5%.

APA, Harvard, Vancouver, ISO, and other styles

2

Graffi, Giacomo. "A novel approach for Credit Scoring using Deep Neural Networks with bank transaction data." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text

Abstract:

With the PSD2 open banking revolution FinTechs obtained a key role in the financial industry. This role implies the inquiry and development of new techniques, products and solutions to compete with other players in this area. The aim of this thesis is to investigate the applicability of the state-of-the-art Deep Learning techniques for Credit Risk Modeling. In order to accomplish it, a PSD2-related synthetic and anonymized dataset has been used to simulate an application process with only one account per user. Firstly, a machine-readable representation of the bank accounts has been created, starting from the raw transactions’ data and scaling the variables using the quantile function. Afterwards, a Deep Neural Network has been created in order to capture the complex relations between the input variables and to extract information from the accounts’ representations. The proposed architecture accomplished the assigned tasks with a Gini index of 0.55, exploiting a Convolutional encoder to extract features from the inputs and a Recurrent decoder to analyze them.

APA, Harvard, Vancouver, ISO, and other styles

3

Olin, Per. "Evaluation of text classification techniques for log file classification." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166641.

Full text

Abstract:

System log files are filled with logged events, status codes, and other messages. By analyzing the log files, the systems current state can be determined, and find out if something during its execution went wrong. Log file analysis has been studied for some time now, where recent studies have shown state-of-the-art performance using machine learning techniques. In this thesis, document classification solutions were tested on log files in order to classify regular system runs versus abnormal system runs. To solve this task, supervised and unsupervised learning methods were combined. Doc2Vec was used to extract document features, and Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based architectures on the classification task. With the use of the machine learning models and preprocessing techniques the tested models yielded an f1-score and accuracy above 95% when classifying log files.

APA, Harvard, Vancouver, ISO, and other styles

4

Suresh, Sreerag. "An Analysis of Short-Term Load Forecasting on Residential Buildings Using Deep Learning Models." Thesis, Virginia Tech, 2020. http://hdl.handle.net/10919/99287.

Full text

Abstract:

Building energy load forecasting is becoming an increasingly important task with the rapid deployment of smart homes, integration of renewables into the grid and the advent of decentralized energy systems. Residential load forecasting has been a challenging task since the residential load is highly stochastic. Deep learning models have showed tremendous promise in the fields of time-series and sequential data and have been successfully used in the field of short-term load forecasting at the building level. Although, other studies have looked at using deep learning models for building energy forecasting, most of those studies have looked at limited number of homes or an aggregate load of a collection of homes. This study aims to address this gap and serve as an investigation on selecting the better deep learning model architecture for short term load forecasting on 3 communities of residential buildings. The deep learning models CNN and LSTM have been used in the study. For 15-min ahead forecasting for a collection of homes it was found that homes with a higher variance were better predicted by using CNN models and LSTM showed better performance for homes with lower variances. The effect of adding weather variables on 24-hour ahead forecasting was studied and it was observed that adding weather parameters did not show an improvement in forecasting performance. In all the homes, deep learning models are shown to outperform the simple ANN model.
Master of Science
Building energy load forecasting is becoming an increasingly important task with the rapid deployment of smart homes, integration of renewables into the grid and the advent of decentralized energy systems. Residential load forecasting has been a challenging task since residential load is highly stochastic. Deep learning models have showed tremendous promise in the fields of time-series and sequential data and have been successfully used in the field of short-term load forecasting. Although, other studies have looked at using deep learning models for building energy forecasting, most of those studies have looked at only a single home or an aggregate load of a collection of homes. This study aims to address this gap and serve as an analysis on short term load forecasting on 3 communities of residential buildings. Detailed analysis on the model performances across all homes have been studied. Deep learning models have been used in this study and their efficacy is measured compared to a simple ANN model.

APA, Harvard, Vancouver, ISO, and other styles

5

Terefe, Adisu Wagaw. "Handwritten Recognition for Ethiopic (Ge’ez) Ancient Manuscript Documents." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-288145.

Full text

Abstract:

The handwritten recognition system is a process of learning a pattern from a given image of text. The recognition process usually combines a computer vision task with sequence learning techniques. Transcribing texts from the scanned image remains a challenging problem, especially when the documents are highly degraded, or have excessive dusty noises. Nowadays, there are several handwritten recognition systems both commercially and in free versions, especially for Latin based languages. However, there is no prior study that has been built for Ge’ez handwritten ancient manuscript documents. In contrast, the language has many mysteries of the past, in human history of science, architecture, medicine and astronomy. In this thesis, we present two separate recognition systems. (1) A character-level recognition system which combines computer vision for character segmentation from ancient books and a vanilla Convolutional Neural Network (CNN) to recognize characters. (2) An end- to- end segmentation free handwritten recognition system using CNN, Multi-Dimensional Recurrent Neural Network (MDRNN) with Connectionist Temporal Classification (CTC) for the Ethiopic (Ge’ez) manuscript documents. The proposed character label recognition model outperforms 97.78% accuracy. In contrast, the second model provides an encouraging result which indicates to further study the language properties for better recognition of all the ancient books.
Det handskrivna igenkännings systemet är en process för att lära sig ett mönster från en viss bild av text. Erkännande Processen kombinerar vanligtvis en datorvisionsuppgift med sekvens inlärningstekniker. Transkribering av texter från den skannade bilden är fortfarande ett utmanande problem, särskilt när dokumenten är mycket försämrad eller har för omåttlig dammiga buller. Nuförtiden finns det flera handskrivna igenkänningar system både kommersiellt och i gratisversionen, särskilt för latin baserade språk. Det finns dock ingen tidigare studie som har byggts för Ge’ez handskrivna gamla manuskript dokument. I motsats till detta språk har många mysterier från det förflutna, i vetenskapens mänskliga historia, arkitektur, medicin och astronomi. I denna avhandling presenterar vi två separata igenkänningssystem. (1) Ett karaktärs nivå igenkänningssystem som kombinerar bildigenkänning för karaktär segmentering från forntida böcker och ett vanilj Convolutional Neural Network (CNN) för att erkänna karaktärer. (2) Ett änd-till-slut-segmentering fritt handskrivet igenkänningssystem som använder CNN, Multi-Dimensional Recurrent Neural Network (MDRNN) med Connectionist Temporal Classification (CTC) för etiopiska (Ge’ez) manuskript dokument. Den föreslagna karaktär igenkännings modellen överträffar 97,78% noggrannhet. Däremot ger den andra modellen ett uppmuntrande resultat som indikerar att ytterligare studera språk egenskaperna för bättre igenkänning av alla antika böcker.

APA, Harvard, Vancouver, ISO, and other styles

6

Rintala, Jonathan. "Speech Emotion Recognition from Raw Audio using Deep Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278858.

Full text

Abstract:

Traditionally, in Speech Emotion Recognition, models require a large number of manually engineered features and intermediate representations such as spectrograms for training. However, to hand-engineer such features often requires both expert domain knowledge and resources. Recently, with the emerging paradigm of deep-learning, end-to-end models that extract features themselves and learn from the raw speech signal directly have been explored. A previous approach has been to combine multiple parallel CNNs with different filter lengths to extract multiple temporal features from the audio signal, and then feed the resulting sequence to a recurrent block. Also, other recent work present high accuracies when utilizing local feature learning blocks (LFLBs) for reducing the dimensionality of a raw audio signal, extracting the most important information. Thus, this study will combine the idea of LFLBs for feature extraction with a block of parallel CNNs with different filter lengths for capturing multitemporal features; this will finally be fed into an LSTM layer for global contextual feature learning. To the best of our knowledge, such a combined architecture has yet not been properly investigated. Further, this study will investigate different configurations of such an architecture. The proposed model is then trained and evaluated on the well-known speech databases EmoDB and RAVDESS, both in a speaker-dependent and speaker-independent manner. The results indicate that the proposed architecture can produce comparable results with state-of-the-art; despite excluding data augmentation and advanced pre-processing. It was reported 3 parallel CNN pipes yielded the highest accuracy, together with a series of modified LFLBs that utilize averagepooling and ReLU activation. This shows the power of leaving the feature learning up to the network and opens up for interesting future research on time-complexity and trade-off between introducing complexity in pre-processing or in the model architecture itself.
Traditionellt sätt, vid talbaserad känsloigenkänning, kräver modeller ett stort antal manuellt konstruerade attribut och mellanliggande representationer, såsom spektrogram, för träning. Men att konstruera sådana attribut för hand kräver ofta både domänspecifika expertkunskaper och resurser. Nyligen har djupinlärningens framväxande end-to-end modeller, som utvinner attribut och lär sig direkt från den råa ljudsignalen, undersökts. Ett tidigare tillvägagångssätt har varit att kombinera parallella CNN:er med olika filterlängder för att extrahera flera temporala attribut från ljudsignalen och sedan låta den resulterande sekvensen passera vidare in i ett så kallat Recurrent Neural Network. Andra tidigare studier har också nått en hög noggrannhet när man använder lokala inlärningsblock (LFLB) för att reducera dimensionaliteten hos den råa ljudsignalen, och på så sätt extraheras den viktigaste informationen från ljudet. Således kombinerar denna studie idén om att nyttja LFLB:er för extraktion av attribut, tillsammans med ett block av parallella CNN:er som har olika filterlängder för att fånga multitemporala attribut; detta kommer slutligen att matas in i ett LSTM-lager för global inlärning av kontextuell information. Så vitt vi vet har en sådan kombinerad arkitektur ännu inte undersökts. Vidare kommer denna studie att undersöka olika konfigurationer av en sådan arkitektur. Den föreslagna modellen tränas och utvärderas sedan på de välkända taldatabaserna EmoDB och RAVDESS, både via ett talarberoende och talaroberoende tillvägagångssätt. Resultaten indikerar att den föreslagna arkitekturen kan ge jämförbara resultat med state-of-the-art, trots att ingen ökning av data eller avancerad förbehandling har inkluderats. Det rapporteras att 3 parallella CNN-lager gav högsta noggrannhet, tillsammans med en serie av modifierade LFLB:er som nyttjar average-pooling och ReLU som aktiveringsfunktion. Detta visar fördelarna med att lämna inlärningen av attribut till nätverket och öppnar upp för intressant framtida forskning kring tidskomplexitet och avvägning mellan introduktion av komplexitet i förbehandlingen eller i själva modellarkitekturen.

APA, Harvard, Vancouver, ISO, and other styles

7

Kapoor, Prince. "Shoulder Keypoint-Detection from Object Detection." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/38015.

Full text

Abstract:

This thesis presents detailed observation of different Convolutional Neural Network (CNN) architecture which had assisted Computer Vision researchers to achieve state-of-the-art performance on classification, detection, segmentation and much more to name image analysis challenges. Due to the advent of deep learning, CNN had been used in almost all the computer vision applications and that is why there is utter need to understand the miniature details of these feature extractors and find out their pros and cons of each feature extractor meticulously. In order to perform our experimentation, we decided to explore an object detection task using a particular model architecture which maintains a sweet spot between computational cost and accuracy. The model architecture which we had used is LSTM-Decoder. The model had been experimented with different CNN feature extractor and found their pros and cons in variant scenarios. The results which we had obtained on different datasets elucidates that CNN plays a major role in obtaining higher accuracy and we had also achieved a comparable state-of-the-art accuracy on Pedestrian Detection Dataset. In extension to object detection, we also implemented two different model architectures which find shoulder keypoints. So, One of our idea can be explicated as follows: using the detected annotation from object detection, a small cropped image is generated which would be feed into a small cascade network which was trained for detection of shoulder keypoints. The second strategy is to use the same object detection model and fine tune their weights to predict shoulder keypoints. Currently, we had generated our results for shoulder keypoint detection. However, this idea could be extended to full-body pose Estimation by modifying the cascaded network for pose estimation purpose and this had become an important topic of discussion for the future work of this thesis.

APA, Harvard, Vancouver, ISO, and other styles

8

Engström, Olof. "Deep Learning for Anomaly Detection in Microwave Links : Challenges and Impact on Weather Classification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-276676.

Full text

Abstract:

Artificial intelligence is receiving a great deal of attention in various fields of science and engineering due to its promising applications. In today’s society, weather classification models with high accuracy are of utmost importance. An alternative to using conventional weather radars is to use measured attenuation data in microwave links as the input to deep learning-based weather classification models. Detecting anomalies in the measured attenuation data is of great importance as the output of a classification model cannot be trusted if the input to the classification model contains anomalies. Designing an accurate classification model poses some challenges due to the absence of predefined features to discriminate among the various weather conditions, and due to specific domain requirements in terms of execution time and detection sensitivity. In this thesis we investigate the relationship between anomalies in signal attenuation data, which is the input to a weather classification model, and the model’s misclassifications. To this end, we propose and evaluate two deep learning models based on long short-term memory networks (LSTM) and convolutional neural networks (CNN) for anomaly detection in a weather classification problem. We evaluate the feasibility and possible generalizations of the proposed methodology in an industrial case study at Ericsson AB, Sweden. The results show that both proposed methods can detect anomalies that correlate with misclassifications made by the weather classifier. Although the LSTM performed better than the CNN with regards to top performance on one link and average performance across all 5 tested links, the CNN performance is shown to be more consistent.
Artificiell intelligens har fått mycket uppmärksamhet inom olika teknik- och vetenskapsområden på grund av dess många lovande tillämpningar. I dagens samhälle är väderklassificeringsmodeller med hög noggrannhet av yttersta vikt. Ett alternativ till att använda konventionell väderradar är att använda uppmätta dämpningsdata i mikrovågslänkar som indata till djupinlärningsbaserade väderklassificeringsmodeller. Detektering av avvikelser i uppmätta dämpningsdata är av stor betydelse eftersom en klassificeringsmodells pålitlighet minskar om träningsdatat innehåller avvikelser. Att utforma en noggrann klassificeringsmodell är svårt på grund av bristen på fördefinierade kännetecken för olika typer av väderförhållanden, och på grund av de specifika domänkrav som ofta ställs när det gäller exekveringstid och detekteringskänslighet. I det här examensarbetet undersöker vi förhållandet mellan avvikelser i uppmätta dämpningsdata från mikrovågslänkar, och felklassificeringar gjorda av en väderklassificeringsmodell. För detta ändamål utvärderar vi avvikelsedetektering inom ramen för väderklassificering med hjälp av två djupinlärningsmodeller, baserade på long short-term memory-nätverk (LSTM) och faltningsnätverk (CNN). Vi utvärderar genomförbarhet och generaliserbarhet av den föreslagna metodiken i en industriell fallstudie hos Ericsson AB. Resultaten visar att båda föreslagna metoder kan upptäcka avvikelser som korrelerar med felklassificeringar gjorda av väderklassificeringsmodellen. LSTM-modellen presterade bättre än CNN-modellen både med hänsyn till toppprestanda på en länk och med hänsyn till genomsnittlig prestanda över alla 5 testade länkar, men CNNmodellens prestanda var mer konsistent.

APA, Harvard, Vancouver, ISO, and other styles

9

Chen, Yani. "Deep Learning based 3D Image Segmentation Methods and Applications." Ohio University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1547066297047003.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Lin, Alvin. "Video Based Automatic Speech Recognition Using Neural Networks." DigitalCommons@CalPoly, 2020. https://digitalcommons.calpoly.edu/theses/2343.

Full text

Abstract:

Neural network approaches have become popular in the field of automatic speech recognition (ASR). Most ASR methods use audio data to classify words. Lip reading ASR techniques utilize only video data, which compensates for noisy environments where audio may be compromised. A comprehensive approach, including the vetting of datasets and development of a preprocessing chain, to video-based ASR is developed. This approach will be based on neural networks, namely 3D convolutional neural networks (3D-CNN) and Long short-term memory (LSTM). These types of neural networks are designed to take in temporal data such as videos. Various combinations of different neural network architecture and preprocessing techniques are explored. The best performing neural network architecture, a CNN with bidirectional LSTM, compares favorably against recent works on video-based ASR.

APA, Harvard, Vancouver, ISO, and other styles

11

Lagerhjelm, Linus. "Extracting Information from Encrypted Data using Deep Neural Networks." Thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-155904.

Full text

Abstract:

In this paper we explore various approaches to using deep neural networks to per- form cryptanalysis, with the ultimate goal of having a deep neural network deci- pher encrypted data. We use long short-term memory networks to try to decipher encrypted text and we use a convolutional neural network to perform classification tasks on encrypted MNIST images. We find that although the network is unable to decipher encrypted data, it is able to perform classification on encrypted data. We also find that the networks performance is depending on what key were used to en- crypt the data. These findings could be valuable for further research into the topic of cryptanalysis using deep neural networks.

APA, Harvard, Vancouver, ISO, and other styles

12

Volný, Miloš. "Využití umělé inteligence jako podpory pro rozhodování v podniku." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2019. http://www.nusl.cz/ntk/nusl-399447.

Full text

Abstract:

This thesis is concerned with future trend prediction on capital markets on the basis of neural networks. Usage of convolutional and recurrent neural networks, Elliott wave theory and scalograms for capital market's future trend prediction is discussed. The aim of this thesis is to propose a novel approach to future trend prediction based on Elliott's wave theory. The proposed approach will be based on the principle of classification of chosen patterns from Elliott's theory by the way of convolutional neural network. To this end scalograms of the chosen Elliott patterns will be created through application of continuous wavelet transform on parts of historical time series of price for chosen stocks.

APA, Harvard, Vancouver, ISO, and other styles

13

Mazhar, Osama. "Vision-based human gestures recognition for human-robot interaction." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS044.

Full text

Abstract:

Dans la perspective des usines du futur, pour garantir une interaction productive, sure et efficace entre l’homme et le robot, il est impératif que le robot puisse interpréter l’information fournie par le collaborateur humain. Pour traiter cette problématique nous avons exploré des solutions basées sur l’apprentissage profond et avons développé un framework pour la détection de gestes humains. Le framework proposé permet une détection robuste des gestes statiques de la main et des gestes dynamiques de la partie supérieure du corps.Pour la détection des gestes statiques de la main, openpose est associé à la caméra Kinect V2 afin d’obtenir un pseudo-squelette humain en 3D. Avec la participation de 10 volontaires, nous avons constitué une base de données d’images, opensign, qui comprend les images RGB et de profondeur de la Kinect V2 correspondant à 10 gestes alphanumériques statiques de la main, issus de l’American Sign Language. Un réseau de neurones convolutifs de type « Inception V3 » est adapté et entrainé à détecter des gestes statiques de la main en temps réel.Ce framework de détection des gestes est ensuite étendu pour permettre la reconnaissance des gestes dynamiques. Nous avons proposé une stratégie de détection de gestes dynamiques basée sur un mécanisme d’attention spatiale. Celle-ci utilise un réseau profond de type « Convolutional Neural Network - Long Short-Term Memory » pour l’extraction des dépendances spatio-temporelles dans des séquences vidéo pur RGB. Les blocs de construction du réseau de neurones convolutifs sont pré-entrainés sur notre base de données opensign de gestes statiques de la main, ce qui permet une extraction efficace des caractéristiques de la main. Un module d’attention spatiale exploite la posture 2D de la partie supérieure du corps pour estimer, d’une part, la distance entre la personne et le capteur pour la normalisation de l’échelle et d’autre part, les paramètres des cadres délimitant les mains du sujet sans avoir recourt à un capteur de profondeur. Ainsi, le module d’attention spatiale se focalise sur les grands mouvements des membres supérieurs mais également sur les images des mains, afin de traiter les petits mouvements de la main et des doigts pour mieux distinguer les classes de gestes. Les informations extraites d’une caméra de profondeur sont acquises de la base de données opensign. Par conséquent, la stratégie proposée pour la reconnaissance des gestes peut être adoptée par tout système muni d’une caméra de profondeur.Ensuite, nous explorons brièvement les stratégies d’estimation de postures 3D à l’aide de caméras monoculaires. Nous proposons d’estimer les postures 3D chez l’homme par une approche hybride qui combine les avantages des estimateurs discriminants de postures 2D avec les approches utilisant des modèles génératifs. Notre stratégie optimise une fonction de coût en minimisant l’écart entre la position et l’échelle normalisée de la posture 2D obtenue à l’aide d’openpose, et la projection 2D virtuelle du modèle cinématique du sujet humain.Pour l’interaction homme-robot en temps réel, nous avons développé un système distribué asynchrone afin d’associer notre module de détection de gestes statiques à une librairie consacrée à l’interaction physique homme-robot OpenPHRI. Nous validons la performance de notre framework grâce à une expérimentation de type « apprentissage par démonstration » avec un bras robotique
In the light of factories of the future, to ensure productive, safe and effective interaction between robot and human coworkers, it is imperative that the robot extracts the essential information of the coworker. To address this, deep learning solutions are explored and a reliable human gesture detection framework is developed in this work. Our framework is able to robustly detect static hand gestures plus upper-body dynamic gestures.For static hand gestures detection, openpose is integrated with Kinect V2 to obtain a pseudo-3D human skeleton. With the help of 10 volunteers, we recorded an image dataset opensign, that contains Kinect V2 RGB and depth images of 10 alpha-numeric static hand gestures taken from the American Sign Language. "Inception V3" neural network is adapted and trained to detect static hand gestures in real-time.Subsequently, we extend our gesture detection framework to recognize upper-body dynamic gestures. A spatial attention based dynamic gestures detection strategy is proposed that employs multi-modal "Convolutional Neural Network - Long Short-Term Memory" deep network to extract spatio-temporal dependencies in pure RGB video sequences. The exploited convolutional neural network blocks are pre-trained on our static hand gestures dataset opensign, which allow efficient extraction of hand features. Our spatial attention module focuses on large-scale movements of upper limbs plus on hand images for subtle hand/fingers movements, to efficiently distinguish gestures classes.This module additionally exploits 2D upper-body pose to estimate distance of user from the sensor for scale-normalization plus determine the parameters of hands bounding boxes without a need of depth sensor. The information typically extracted from a depth camera in similar strategies is learned from opensign dataset. Thus the proposed gestures recognition strategy can be implemented on any system with a monocular camera.Afterwards, we briefly explore 3D human pose estimation strategies for monocular cameras. To estimate 3D human pose, a hybrid strategy is proposed which combines the merits of discriminative 2D pose estimators with that of model based generative approaches. Our method optimizes an objective function, that minimizes the discrepancy between position & scale-normalized 2D pose obtained from openpose, and a virtual 2D projection of a kinematic human model.For real-time human-robot interaction, an asynchronous distributed system is developed to integrate our static hand gestures detector module with an open-source physical human-robot interaction library OpenPHRI. We validate performance of the proposed framework through a teach by demonstration experiment with a robotic manipulator

APA, Harvard, Vancouver, ISO, and other styles

14

Shaif, Ayad. "Predictive Maintenance in Smart Agriculture Using Machine Learning : A Novel Algorithm for Drift Fault Detection in Hydroponic Sensors." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-42270.

Full text

Abstract:

The success of Internet of Things solutions allowed the establishment of new applications such as smart hydroponic agriculture. One typical problem in such an application is the rapid degradation of the deployed sensors. Traditionally, this problem is resolved by frequent manual maintenance, which is considered to be ineffective and may harm the crops in the long run. The main purpose of this thesis was to propose a machine learning approach for automating the detection of sensor fault drifts. In addition, the solution’s operability was investigated in a cloud computing environment in terms of the response time. This thesis proposes a detection algorithm that utilizes RNN in predicting sensor drifts from time-series data streams. The detection algorithm was later named; Predictive Sliding Detection Window (PSDW) and consisted of both forecasting and classification models. Three different RNN algorithms, i.e., LSTM, CNN-LSTM, and GRU, were designed to predict sensor drifts using forecasting and classification techniques. The algorithms were compared against each other in terms of relevant accuracy metrics for forecasting and classification. The operability of the solution was investigated by developing a web server that hosted the PSDW algorithm on an AWS computing instance. The resulting forecasting and classification algorithms were able to make reasonably accurate predictions for this particular scenario. More specifically, the forecasting algorithms acquired relatively low RMSE values as ~0.6, while the classification algorithms obtained an average F1-score and accuracy of ~80% but with a high standard deviation. However, the response time was ~5700% slower during the simulation of the HTTP requests. The obtained results suggest the need for future investigations to improve the accuracy of the models and experiment with other computing paradigms for more reliable deployments.

APA, Harvard, Vancouver, ISO, and other styles

15

Evholt, David, and Oscar Larsson. "Generative Adversarial Networks and Natural Language Processing for Macroeconomic Forecasting." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273422.

Full text

Abstract:

Macroeconomic forecasting is a classic problem, today most often modeled using time series analysis. Few attempts have been made using machine learning methods, and even fewer incorporating unconventional data, such as that from social media. In this thesis, a Generative Adversarial Network (GAN) is used to predict U.S. unemployment, beating the ARIMA benchmark on all horizons. Furthermore, attempts at using Twitter data and the Natural Language Processing (NLP) model DistilBERT are performed. While these attempts do not beat the benchmark, they do show promising results with predictive power. The models are also tested at predicting the U.S. stock index S&P 500. For these models, the Twitter data does improve the accuracy and shows the potential of social media data when predicting a more erratic index with less seasonality that is more responsive to current trends in public discourse. The results also show that Twitter data can be used to predict trends in both unemployment and the S&P 500 index. This sets the stage for further research into NLP-GAN models for macroeconomic predictions using social media data.
Makroekonomiska prognoser är sedan länge en svår utmaning. Idag löses de oftast med tidsserieanalys och få försök har gjorts med maskininlärning. I denna uppsats används ett generativt motstridande nätverk (GAN) för att förutspå amerikansk arbetslöshet, med resultat som slår samtliga riktmärken satta av en ARIMA. Ett försök görs också till att använda data från Twitter och den datorlingvistiska (NLP) modellen DistilBERT. Dessa modeller slår inte riktmärkena men visar lovande resultat. Modellerna testas vidare på det amerikanska börsindexet S&P 500. För dessa modeller förbättrade Twitterdata resultaten vilket visar på den potential data från sociala medier har när de appliceras på mer oregelbunda index, utan tydligt säsongsberoende och som är mer känsliga för trender i det offentliga samtalet. Resultaten visar på att Twitterdata kan användas för att hitta trender i både amerikansk arbetslöshet och S&P 500 indexet. Detta lägger grunden för fortsatt forskning inom NLP-GAN modeller för makroekonomiska prognoser baserade på data från sociala medier.

APA, Harvard, Vancouver, ISO, and other styles

16

Holm, Noah, and Emil Plynning. "Spatio-temporal prediction of residential burglaries using convolutional LSTM neural networks." Thesis, KTH, Geoinformatik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229952.

Full text

Abstract:

The low amount solved residential burglary crimes calls for new and innovative methods in the prevention and investigation of the cases. There were 22 600 reported residential burglaries in Sweden 2017 but only four to five percent of these will ever be solved. There are many initiatives in both Sweden and abroad for decreasing the amount of occurring residential burglaries and one of the areas that are being tested is the use of prediction methods for more efficient preventive actions. This thesis is an investigation of a potential method of prediction by using neural networks to identify areas that have a higher risk of burglaries on a daily basis. The model use reported burglaries to learn patterns in both space and time. The rationale for the existence of patterns is based on near repeat theories in criminology which states that after a burglary both the burgled victim and an area around that victim has an increased risk of additional burglaries. The work has been conducted in cooperation with the Swedish Police authority. The machine learning is implemented with convolutional long short-term memory (LSTM) neural networks with max pooling in three dimensions that learn from ten years of residential burglary data (2007-2016) in a study area in Stockholm, Sweden. The model's accuracy is measured by performing predictions of burglaries during 2017 on a daily basis. It classifies cells in a 36x36 grid with 600 meter square grid cells as areas with elevated risk or not. By classifying 4% of all grid cells during the year as risk areas, 43% of all burglaries are correctly predicted. The performance of the model could potentially be improved by further configuration of the parameters of the neural network, along with a use of more data with factors that are correlated to burglaries, for instance weather. Consequently, further work in these areas could increase the accuracy. The conclusion is that neural networks or machine learning in general could be a powerful and innovative tool for the Swedish Police authority to predict and moreover prevent certain crime. This thesis serves as a first prototype of how such a system could be implemented and used.

APA, Harvard, Vancouver, ISO, and other styles

17

Näslund, Per. "Artificial Neural Networks in Swedish Speech Synthesis." Thesis, KTH, Tal-kommunikation, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-239350.

Full text

Abstract:

Text-to-speech (TTS) systems have entered our daily lives in the form of smart assistants and many other applications. Contemporary re- search applies machine learning and artificial neural networks (ANNs) to synthesize speech. It has been shown that these systems outperform the older concatenative and parametric methods. In this paper, ANN-based methods for speech synthesis are ex- plored and one of the methods is implemented for the Swedish lan- guage. The implemented method is dubbed “Tacotron” and is a first step towards end-to-end ANN-based TTS which puts many differ- ent ANN-techniques to work. The resulting system is compared to a parametric TTS through a strength-of-preference test that is carried out with 20 Swedish speaking subjects. A statistically significant pref- erence for the ANN-based TTS is found. Test subjects indicate that the ANN-based TTS performs better than the parametric TTS when it comes to audio quality and naturalness but sometimes lacks in intelli- gibility.
Talsynteser, också kallat TTS (text-to-speech) används i stor utsträckning inom smarta assistenter och många andra applikationer. Samtida forskning applicerar maskininlärning och artificiella neurala nätverk (ANN) för att utföra talsyntes. Det har visats i studier att dessa system presterar bättre än de äldre konkatenativa och parametriska metoderna. I den här rapporten utforskas ANN-baserade TTS-metoder och en av metoderna implementeras för det svenska språket. Den använda metoden kallas “Tacotron” och är ett första steg mot end-to-end TTS baserat på neurala nätverk. Metoden binder samman flertalet olika ANN-tekniker. Det resulterande systemet jämförs med en parametriskt TTS genom ett graderat preferens-test som innefattar 20 svensktalande försökspersoner. En statistiskt säkerställd preferens för det ANN- baserade TTS-systemet fastställs. Försökspersonerna indikerar att det ANN-baserade TTS-systemet presterar bättre än det parametriska när det kommer till ljudkvalitet och naturlighet men visar brister inom tydlighet.

APA, Harvard, Vancouver, ISO, and other styles

18

Ďuriš, Denis. "Detekce ohně a kouře z obrazového signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-412968.

Full text

Abstract:

This diploma thesis deals with the detection of fire and smoke from the image signal. The approach of this work uses a combination of convolutional and recurrent neural network. Machine learning models created in this work contain inception modules and blocks of long short-term memory. The research part describes selected models of machine learning used in solving the problem of fire detection in static and dynamic image data. As part of the solution, a data set containing videos and still images used to train the designed neural networks was created. The results of this approach are evaluated in conclusion.

APA, Harvard, Vancouver, ISO, and other styles

19

Broomé, Sofia. "Objectively recognizing human activity in body-worn sensor data with (more or less) deep neural networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210243.

Full text

Abstract:

This thesis concerns the application of different artificial neural network architectures on the classification of multivariate accelerometer time series data into activity classes such as sitting, lying down, running, or walking. There is a strong correlation between increased health risks in children and their amount of daily screen time (as reported in questionnaires). The dependency is not clearly understood, as there are no such dependencies reported when the sedentary (idle) time is measured objectively. Consequently, there is an interest from the medical side to be able to perform such objective measurements. To enable large studies the measurement equipment should ideally be low-cost and non-intrusive. The report investigates how well these movement patterns can be distinguished given a certain measurement setup and a certain network structure, and how well the networks generalise to noisier data. Recurrent neural networks are given extra attention among the different networks, since they are considered well suited for data of sequential nature. Close to state-of-the-art results (95% weighted F1-score) are obtained for the tasks with 4 and 5 classes, which is notable since a considerably smaller number of sensors is used than in the previously published results. Another contribution of this thesis is that a new labeled dataset with 12 activity categories is provided, consisting of around 6 hours of recordings, comparable in number of samples to benchmarking datasets. The data collection was made in collaboration with the Department of Public Health at Karolinska Institutet.
Inom ramen för uppsatsen testas hur väl rörelsemönster kan urskiljas ur accelerometerdatamed hjälp av den gren av maskininlärning som kallas djupinlärning; där djupa artificiellaneurala nätverk av noder funktionsapproximerar mappandes från domänen av sensordatatill olika fördefinerade kategorier av aktiviteter så som gång, stående, sittande eller liggande.Det finns ett intresse från den medicinska sidan att kunna mäta fysisk aktivitet objektivt,bland annat eftersom det visats att det finns en korrelation mellan ökade hälsorisker hosbarn och deras mängd daglig skärmtid. Denna typ av mätningar ska helst kunna göras medicke-invasiv utrustning till låg kostnad för att kunna göra större studier.Enklare nätverksarkitekturer samt återimplementeringar av bästa möjliga teknik inomområdet Mänsklig aktivitetsigenkänning (HAR) testas både på ett benchmarkingdataset ochpå egeninhämtad data i samarbete med Institutet för Folkhälsovetenskap på Karolinska Institutetoch resultat redovisas för olika val av möjliga klassificeringar och olika antal dimensionerper mätpunkt. De uppnådda resultaten (95% F1-score) på ett 4- och 5-klass-problem ärjämförbara med de bästa tidigare publicerade resultaten för aktivitetsigenkänning, vilket äranmärkningsvärt då då betydligt färre accelerometrar har använts här än i de åsyftade studierna.Förutom klassificeringsresultaten som redovisas bidrar det här arbetet med ett nyttinhämtat och kategorimärkt dataset; KTH-KI-AA. Det är jämförbart i antal datapunkter medspridda benchmarkingdataset inom HAR-området.

APA, Harvard, Vancouver, ISO, and other styles

20

Gopchandani, Sandhya. "Using Word Embeddings to Explore the Language of Depression on Twitter." ScholarWorks @ UVM, 2019. https://scholarworks.uvm.edu/graddis/1072.

Full text

Abstract:

How do people discuss mental health on social media? Can we train a computer program to recognize differences between discussions of depression and other topics? Can an algorithm predict that someone is depressed from their tweets alone? In this project, we collect tweets referencing “depression” and “depressed” over a seven year period, and train word embeddings to characterize linguistic structures within the corpus. We find that neural word embeddings capture the contextual differences between “depressed” and “healthy” language. We also looked at how context around words may have changed over time to get deeper understanding of contextual shifts in the word usage. Finally, we trained a deep learning network on a much smaller collection of tweets authored by individuals formally diagnosed with depression. The best performing model for the prediction task is Convolutional LSTM (CNN-LSTM) model with a F-score of 69% on test data. The results suggest social media could serve as a valuable screening tool for mental health.

APA, Harvard, Vancouver, ISO, and other styles

21

Mukhedkar, Dhananjay. "Polyphonic Music Instrument Detection on Weakly Labelled Data using Sequence Learning Models." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279060.

Full text

Abstract:

Polyphonic or multiple music instrument detection is a difficult problem compared to detecting single or solo instruments in an audio recording. As music is time series data it be can modelled using sequence learning methods within deep learning. Recently, temporal convolutional networks (TCN) have shown to outperform conventional recurrent neural networks (RNN) on various sequence modelling tasks. Though there have been significant improvements in deep learning methods, data scarcity becomes a problem in training large scale models. Weakly labelled data is an alternative where a clip is annotated for presence or absence of instruments without specifying the times at which an instrument is sounding. This study investigates how TCN model compares to a Long Short-Term Memory (LSTM) model while trained on weakly labelled dataset. The results showed successful training of both models along with generalisation on a separate dataset. The comparison showed that TCN performed better than LSTM, but only marginally. Therefore, from the experiments carried out it could not be explicitly concluded if TCN is convincingly a better choice over LSTM in the context of instrument detection, but definitely a strong alternative.
Polyfonisk eller multipel musikinstrumentdetektering är ett svårt problem jämfört med att detektera enstaka eller soloinstrument i en ljudinspelning. Eftersom musik är tidsseriedata kan den modelleras med hjälp av sekvensinlärningsmetoder inom djup inlärning. Nyligen har ’Temporal Convolutional Network’ (TCN) visat sig överträffa konventionella ’Recurrent Neural Network’ (RNN) på flertalet sekvensmodelleringsuppgifter. Även om det har skett betydande förbättringar i metoder för djup inlärning, blir dataknapphet ett problem vid utbildning av storskaliga modeller. Svagt märkta data är ett alternativ där ett klipp kommenteras för närvaro av frånvaro av instrument utan att ange de tidpunkter då ett instrument låter. Denna studie undersöker hur TCN-modellen jämförs med en ’Long Short-Term Memory’ (LSTM) -modell medan den tränas i svagt märkta datasätt. Resultaten visade framgångsrik utbildning av båda modellerna tillsammans med generalisering i en separat datasats. Jämförelsen visade att TCN presterade bättre än LSTM, men endast marginellt. Därför kan man från de genomförda experimenten inte uttryckligen dra slutsatsen om TCN övertygande är ett bättre val jämfört med LSTM i samband med instrumentdetektering, men definitivt ett starkt alternativ.

APA, Harvard, Vancouver, ISO, and other styles

22

Johansson, Alexander, and Oscar Sandberg. "A COMPARATIVE STUDY OF DEEP-LEARNING APPROACHES FOR ACTIVITY RECOGNITION USING SENSOR DATA IN SMART OFFICE ENVIRONMENTS." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20928.

Full text

Abstract:

Syftet med studien är att jämföra tre deep learning nätverk med varandra för att ta reda på vilket nätverk som kan producera den högsta uppmätta noggrannheten. Noggrannheten mäts genom att nätverken försöker förutspå antalet personer som vistas i rummet där observation äger rum. Utöver att jämföra de tre djupinlärningsnätverk med varandra, kommer vi även att jämföra dem med en traditionell metoder inom maskininlärning - i syfte för att ta reda på ifall djupinlärningsnätverken presterar bättre än vad traditionella metoder gör. I studien används design and creation. Design and creation är en forskningsmetodologi som lägger stor fokus på att utveckla en IT produkt och använda produkten som dess bidrag till ny kunskap. Metodologin har fem olika faser, vi valde att göra en iterativ process mellan utveckling- och utvärderingfaserna. Observation är den datagenereringsmetod som används i studien för att samla in data. Datagenereringen pågick under tre veckor och under tiden hann 31287 rader data registreras i vår databas. Ett av våra nätverk fick vi en noggrannhet på 78.2%, de andra två nätverken fick en noggrannhet på 45.6% respektive 40.3%. För våra traditionella metoder använde vi ett beslutsträd med två olika formler, de producerade en noggrannhet på 61.3% respektive 57.2%. Resultatet av denna studie visar på att utav de tre djupinlärningsnätverken kan endast en av djupinlärningsnätverken producera en högre noggrannhet än de traditionella maskininlärningsmetoderna. Detta resultatet betyder nödvändigtvis inte att djupinlärningsnätverk i allmänhet kan producera en högre noggrannhet än traditionella maskininlärningsmetoder. Ytterligare arbete som kan göras är följande: ytterligare experiment med datasetet och hyperparameter av djupinlärningsnätverken, samla in mer data och korrekt validera denna data och jämföra fler djupinlärningsnätverk och maskininlärningsmetoder.
The purpose of the study is to compare three deep learning networks with each other to evaluate which network can produce the highest prediction accuracy. Accuracy is measured as the networks try to predict the number of people in the room where observation takes place. In addition to comparing the three deep learning networks with each other, we also compare the networks with a traditional machine learning approach - in order to find out if deep learning methods perform better than traditional methods do. This study uses design and creation. Design and creation is a methodology that places great emphasis on developing an IT product and uses the product as its contribution to new knowledge. The methodology has five different phases; we choose to make an iterative process between the development and evaluation phases. Observation is the data generation method used to collect data. Data generation lasted for three weeks, resulting in 31287 rows of data recorded in our database. One of our deep learning networks produced an accuracy of 78.2% meanwhile, the two other approaches produced an accuracy of 45.6% and 40.3% respectively. For our traditional method decision trees were used, we used two different formulas and they produced an accuracy of 61.3% and 57.2% respectively. The result of this thesis shows that out of the three deep learning networks included in this study, only one deep learning network is able to produce a higher predictive accuracy than the traditional ML approaches. This result does not necessarily mean that deep learning approaches in general, are able to produce a higher predictive accuracy than traditional machine learning approaches. Further work that can be made is the following: further experimentation with the dataset and hyperparameters, gather more data and properly validate this data and compare more and other deep learning and machine learning approaches.

APA, Harvard, Vancouver, ISO, and other styles

23

Hedar, Sara. "Applying Machine Learning Methods to Predict the Outcome of Shots in Football." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-414774.

Full text

Abstract:

The thesis investigates a publicly available dataset which covers morethan three million events in football matches. The aim of the study isto train machine learning models capable of modeling the relationshipbetween a shot event and its outcome. That is, to predict if a footballshot will result in a goal or not. By representing the shot indifferent ways, the aim is to draw conclusion regarding what elementsof a shot allows for a good prediction of its outcome. The shotrepresentation was varied both by including different numbers of eventspreceding the shot and by varying the set of features describing eachevent.The study shows that the performance of the machine learning modelsbenefit from including events preceding the shot. The highestpredictive performance was achieved by a long short-term memory neuralnetwork trained on the shot event and six events preceding the shot.The features which were found to have the largest positive impact onthe shot events were the precision of the event, the position on thefield and how the player was in contact with the ball. The size of thedataset was also evaluated and the results suggest that it issufficiently large for the size of the networks evaluated.

APA, Harvard, Vancouver, ISO, and other styles

24

Hamerník, Pavel. "Využití hlubokého učení pro rozpoznání textu v obrazu grafického uživatelského rozhraní." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403823.

Full text

Abstract:

Optical character recognition (OCR) has been a topic of interest for many years. It is defined as the process of digitizing a document image into a sequence of characters. Despite decades of intense research, OCR systems with capabilities to that of human still remains an open challenge. In this work there is presented a design and implementation of such system, which is capable of detecting texts in graphical user interfaces.

APA, Harvard, Vancouver, ISO, and other styles

25

Kvita, Jakub. "Popis fotografií pomocí rekurentních neuronových sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255324.

Full text

Abstract:

Tato práce se zabývá automatickým generovaním popisů obrázků s využitím několika druhů neuronových sítí. Práce je založena na článcích z MS COCO Captioning Challenge 2015 a znakových jazykových modelech, popularizovaných A. Karpathym. Navržený model je kombinací konvoluční a rekurentní neuronové sítě s architekturou kodér--dekodér. Vektor reprezentující zakódovaný obrázek je předáván jazykovému modelu jako hodnoty paměti LSTM vrstev v síti. Práce zkoumá, na jaké úrovni je model s takto jednoduchou architekturou schopen popisovat obrázky a jak si stojí v porovnání s ostatními současnými modely. Jedním ze závěrů práce je, že navržená architektura není dostatečná pro jakýkoli popis obrázků.

APA, Harvard, Vancouver, ISO, and other styles

26

Kramář, Denis. "Analýza zvukových nahrávek pomocí hlubokého učení." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2021. http://www.nusl.cz/ntk/nusl-442571.

Full text

Abstract:

This master thesis deals with the problem of audio-classification of the chainsaw logging sound in natural environment using mainly convolutional neural networks. First, a theory of grafical representation of audio signal is discussed. Following part is devoted to the machine learning area. In third chapter, some of present works dealing with this problematics are given. Within the practical part, used dataset and tested neural networks are presented. Final resultes are compared by achieved accuracy and by ROC curves. The robustness of the presented solutions was tested by proposed detection program and evaluated using objective criteria.

APA, Harvard, Vancouver, ISO, and other styles

27

Albert, Florea George, and Filip Weilid. "Deep Learning Models for Human Activity Recognition." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20201.

Full text

Abstract:

AMI Meeting Corpus (AMI) -databasen används för att undersöka igenkännande av gruppaktivitet. AMI Meeting Corpus (AMI) -databasen ger forskare fjärrstyrda möten och naturliga möten i en kontorsmiljö; mötescenario i ett fyra personers stort kontorsrum. För attuppnågruppaktivitetsigenkänninganvändesbildsekvenserfrånvideosoch2-dimensionella audiospektrogram från AMI-databasen. Bildsekvenserna är RGB-färgade bilder och ljudspektrogram har en färgkanal. Bildsekvenserna producerades i batcher så att temporala funktioner kunde utvärderas tillsammans med ljudspektrogrammen. Det har visats att inkludering av temporala funktioner både under modellträning och sedan förutsäga beteende hos en aktivitet ökar valideringsnoggrannheten jämfört med modeller som endast använder rumsfunktioner[1]. Deep learning arkitekturer har implementerats för att känna igen olika mänskliga aktiviteter i AMI-kontorsmiljön med hjälp av extraherade data från the AMI-databas.Neurala nätverks modellerna byggdes med hjälp av KerasAPI tillsammans med TensorFlow biblioteket. Det ﬁnns olika typer av neurala nätverksarkitekturer. Arkitekturerna som undersöktes i detta projektet var Residual Neural Network, Visual GeometryGroup 16, Inception V3 och RCNN (LSTM). ImageNet-vikter har använts för att initialisera vikterna för Neurala nätverk basmodeller. ImageNet-vikterna tillhandahålls av Keras API och är optimerade för varje basmodell [2]. Basmodellerna använder ImageNet-vikter när de extraherar funktioner från inmatningsdata. Funktionsextraktionen med hjälp av ImageNet-vikter eller slumpmässiga vikter tillsammans med basmodellerna visade lovande resultat. Både Deep Learning användningen av täta skikt och LSTM spatio-temporala sekvens predikering implementerades framgångsrikt.
The Augmented Multi-party Interaction(AMI) Meeting Corpus database is used to investigate group activity recognition in an oﬃce environment. The AMI Meeting Corpus database provides researchers with remote controlled meetings and natural meetings in an oﬃce environment; meeting scenario in a four person sized oﬃce room. To achieve the group activity recognition video frames and 2-dimensional audio spectrograms were extracted from the AMI database. The video frames were RGB colored images and audio spectrograms had one color channel. The video frames were produced in batches so that temporal features could be evaluated together with the audio spectrogrames. It has been shown that including temporal features both during model training and then predicting the behavior of an activity increases the validation accuracy compared to models that only use spatial features [1]. Deep learning architectures have been implemented to recognize diﬀerent human activities in the AMI oﬃce environment using the extracted data from the AMI database.The Neural Network models were built using the Keras API together with TensorFlow library. There are diﬀerent types of Neural Network architectures. The architecture types that were investigated in this project were Residual Neural Network, Visual Geometry Group 16, Inception V3 and RCNN(Recurrent Neural Network). ImageNet weights have been used to initialize the weights for the Neural Network base models. ImageNet weights were provided by Keras API and was optimized for each base model[2]. The base models uses ImageNet weights when extracting features from the input data.The feature extraction using ImageNet weights or random weights together with the base models showed promising results. Both the Deep Learning using dense layers and the LSTM spatio-temporal sequence prediction were implemented successfully.

APA, Harvard, Vancouver, ISO, and other styles

28

Hsu, Tsu-Jui, and 許祖瑞. "Programmable CNN LSTM ASIC Design for Biomedical Application." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/x35asn.

Full text

Abstract:

碩士
國立交通大學
電子研究所
107
Mobile health device is key factor for personal health care. Mobile health device can analyze user's physical well-being instantly. Mobile health device combines Artificial Intelligence, Big Data, Internet of Things, sensors, etc. It plays important role in personal health care. Unlike cloud computing, mobile health device has very limited computing resources and computing power, mobile health device needs to achieve edge computing. Edge computing requires low power and real-time computing. To achieve low power and real-time computing, we design an ASIC that can process multiple deep learning networks. Supported deep learning networks includes Convolutional Neural Network, Long Short Term Memory and Fully Connect. We also make the ASIC programmable, so that our ASIC can support different layers, kernel sizes, channel sizes for CNN and LSTM. Our ASIC achieve low power by sharing same PE among all three networks, and the main buffers used by LSTM is fully shared with FC. Under the real-time processing constrain, our ASIC can achieve 2.56 uW dynamic power and 224 uW static power, the total power is only 226.56 uW, since our ASIC is not running very fast, the clock frequency is only 3MHz, so most of power consumption is from static power. Our ASIC mainly processes PPG signal, and main application is Biometric Identification, Signal Selector, and Blood Glucose Predictor. The first two application utilize LSTM and FC networks. Blood Glucose Predictor utilize CNN, LSTM and FC. By combining these three networks, we can offer more stable and secure personal health care.

APA, Harvard, Vancouver, ISO, and other styles

29

LIN, YOU-YING, and 林佑穎. "Integration of CNN and LSTM for abnormal behavior detection." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/mamd39.

Full text

Abstract:

碩士
國立雲林科技大學
資訊工程系
107
In recent years, violent incident has often been reported by the news, which caused people to begin to pay attention to the issue of public safety. As a result, the monitor is starting to become important. However, the traditional monitor only has the function of recording, storing and playing the video, which is only used to record the occurrence of events. If an abnormal event occurs, the traditional monitor cannot have a warning effect. This study uses the video of a monitor to automatically identify the moment when human behavior is abnormal, it will immediately send the notices to the security personnel, so that countermeasures can be taken more quickly to improve the security level. Therefore, this thesis proposes an abnormal behavior detection model based on deep learning, which applying object detection technology to pedestrian detection., and track the detected pedestrians continuously, and then uses convolutional neural networks to extract the action characteristics of each tracking trajectory, in order to predict abnormal behavior(fall, kick, punch) through Long Short-Term Memory Network. The experimental results show that the proposed method has a good recognition effect in both the Fall Detection Dataset and the UT-interaction dataset, and it can meet the real-time detection requirements in real-world scenarios. The accuracy rate can reach 83.31%.

APA, Harvard, Vancouver, ISO, and other styles

30

Coelho, Jorge Andre de Carvalho, and 卡橋安. "Music Structural Segmentation from Audio Signals using CNN Bidirectional LSTM." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/m2j6q8.

Full text

Abstract:

碩士
國立清華大學
資訊系統與應用研究所
107
In this paper, we investigate the problems of segmenting a piece of music into its structural components from its audio signals. We devise a deep learning neural network architecture called CNN Bidirectional LSTM model which combines convolutional neural networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) to perform music boundary detection. The music audio input to the model is first converted into one spectrogram and two SSMs that can be classified by the deep neural network. We also propose the use of Chroma Energy Normalized Statistics on this task. We show the resulting improvements over previous work with respect to precision and recall. We verified improvement of 11.2\% and 6.58\% F1-score at $ m0.5$ seconds and $ m3$ seconds tolerance, respectively.

APA, Harvard, Vancouver, ISO, and other styles

31

HUANG, GANG-CHENG, and 黃綱正. "Design of Malware Classification Method Combined with CNN and LSTM model." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/s4j874.

Full text

Abstract:

碩士
國立臺北科技大學
自動化科技研究所
108
This thesis presented the malware classification by using deep learning models. During the last decade, this had been implemented not only by machine learning (e.g., SVM, decision tree, etc.) but also by convolution neural networks (CNNs), and recurrent neural networks (RNNs). Many studies had experimented that using deep learning models had much higher accuracy than machine learning models. This proposed algorithm with “Malimg” which was one of the computer worm’s dataset had achieved an improvement from the accuracy from 84.92% to 87.79% by using the combination of CNN and LSTM models.

APA, Harvard, Vancouver, ISO, and other styles

32

Su, Ruei-Ye, and 蘇瑞燁. "A Bi-directional LSTM-CNN Model with Attention for Chinese Sentiment Analysis." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/2y9j7r.

Full text

Abstract:

碩士
樹德科技大學
資訊工程系碩士班
107
With the massive development of social media, people are used to sharing personal ideas and opinions on social media service platforms and most people have personal viewpoints on certain specific topics. As time goes on, large amounts of data are generated, which contain potentially valuable information from the perspective of business. In the field of NLP (Natural Language Processing), sentiment analysis in Chinese messages is one of the major approaches to grasping Internet public opinion. This paper originally proposed a LSAEB-CNN （Bi-LSTM Self-Attention of Emoticon-Based Convolutional Neural Network）, which is a deep learning method that combines Bi-directional Long Short-Term Memory （Bi-LSTM） with Convolutional Neural Networks （CNN）, and embeds emoticons into Self-Attention. The method could effectively identify different emotional polarities without external knowledge, but the focus in Self-Attention excessive attention to problems. This paper thus proposes a further improved method: Bi-LSTM Multi-Head Attention of Emoticon-Based Convolutional Neural Network （LMAEB-CNN） on Self-Attention. Most importantly, the method lets each vector perform multi-layer operations. The data was collected from Plurk, the micro-blogging service, with deep learning conducted in Keras. Chinese micro-blogs were checked for sentiment polarity classification and the study achieved an accuracy rate of about 98.9%, which is significantly higher than other methods.

APA, Harvard, Vancouver, ISO, and other styles

33

宋恩喆. "An Agricultural Irrigation System with Soil Moisture Prediction Using Hybrid LSTM-CNN Learning for LoRaWAN Networks." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/ax6wy8.

Full text

Abstract:

碩士
國立臺北大學
資訊工程學系
107
The production of agriculture is affected by many environmental factors, and irrigation water is also one of the factors. Soil moisture is an important part of crop irrigation decision-making and can be used as a reference for crops to be irrigated. Predicting future changes in soil moisture values can serve as a reference for irrigation decision making, thereby increasing crop growth. In this paper, we present a LoRaWAN agricultural irrigation system with soil moisture prediction using hybrid LSTM-CNN Learning. The proposed model has two channels, LSTM and CNN, which learn the long-term dependence and local features of the data, respectively, and combine the results of the two channels to produce the final soil moisture prediction results. For the training data, using environmental sensor data deployed in a farm greenhouse that connects with LoRaWAN and transmits the data to a cloud application server. Another source of training data is weather data from the Central Weather Bureau for forecasting. We use these data to establish the proposed hybrid LSTM-CNN prediction model. The experimental results show that the proposed hybrid LSTM-CNN prediction model can achieve the expected results and predict the soil moisture value for the next hour.

APA, Harvard, Vancouver, ISO, and other styles

34

Hsu, Ya-Ling, and 徐雅玲. "Toward Automatic Pain-Level Detection for Emergency Patients using Fusion of CNN and LSTM Multimodal Audio-Video Features." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/kr4cgk.

Full text

Abstract:

碩士
國立清華大學
電機工程學系所
106
Nowadays, emergency department are often considered as the most efficient ways to seek medical care. However, to allocate the healthcare resource effectively, triage classification system plays an important role in assessing the severity of illness of the boarding patient at emergency department. There are some factors listed in Taiwan triage and acuity scale (TTAS) about triage classification system. And the self-report pain intensity numerical-rating scale (NRS) is one of the major modifiers of the current triage system based on the TTAS. In clinical practice, physicians and nurses have noticed the difficulty in the systematic implementation of this instrument especially for elderly people, foreigners, or patients with a low education level. This often leads to the triage nurses would select the level through his/her own observations instead of soliciting an answer from the patient. These ways would create a deviation on the consistency and validity of the triage classification system. In this paper, we have cooperation with emergency physicians in Linkou Chang Gung Memorial Hospital. We extract the multimodal behavioral signal of facial expression and vocal characteristics from patients, and model these behaviors by using machine learning models of CNN and LSTM respectively. The experimental results show that the accuracy of 77.1% and 55.7%, respectively, in the two and three classes of pain recognition. Further, in the experimental analysis, we also found that it had significant relationship with facial expression and vocal characteristics of patients.

APA, Harvard, Vancouver, ISO, and other styles

35

Zhou, Quan. "Bidirectional long short-term memory network for proto-object representation." Thesis, 2018. https://hdl.handle.net/2144/31682.

Full text

Abstract:

Researchers have developed many visual saliency models in order to advance the technology in computer vision. Neural networks, Convolution Neural Networks (CNNs) in particular, have successfully differentiate objects in images through feature extraction. Meanwhile, Cummings et al. has proposed a proto-object image saliency (POIS) model that shows perceptual objects or shapes can be modelled through the bottom-up saliency algorithm. Inspired from their work, this research is aimed to explore the imbedding features in the proto-object representations and utilizing artificial neural networks (ANN) to capture and predict the saliency output of POIS. A combination of CNN and a bi-directional long short-term memory (BLSTM) neural network is proposed for this saliency model as a machine learning alternative to the border ownership and grouping mechanism in POIS. As ANNs become more efficient in performing visual saliency tasks, the result of this work would extend their application in computer vision through successful implementation for proto-object based saliency.

APA, Harvard, Vancouver, ISO, and other styles

36

Raptis, Konstantinos. "The clash between two worlds in human action recognition: supervised feature training vs Recurrent ConvNet." Thesis, 2016. https://doi.org/10.7912/C2CW7G.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
Action recognition has been an active research topic for over three decades. There are various applications of action recognition, such as surveillance, human-computer interaction, and content-based retrieval. Recently, research focuses on movies, web videos, and TV shows datasets. The nature of these datasets make action recognition very challenging due to scene variability and complexity, namely background clutter, occlusions, viewpoint changes, fast irregular motion, and large spatio-temporal search space (articulation configurations and motions). The use of local space and time image features shows promising results, avoiding the cumbersome and often inaccurate frame-by-frame segmentation (boundary estimation). We focus on two state of the art methods for the action classification problem: dense trajectories and recurrent neural networks (RNN). Dense trajectories use typical supervised training (e.g., with Support Vector Machines) of features such as 3D-SIFT, extended SURF, HOG3D, and local trinary patterns; the main idea is to densely sample these features in each frame and track them in the sequence based on optical flow. On the other hand, the deep neural network uses the input frames to detect action and produce part proposals, i.e., estimate information on body parts (shapes and locations). We compare qualitatively and numerically these two approaches, indicative to what is used today, and describe our conclusions with respect to accuracy and efficiency.

APA, Harvard, Vancouver, ISO, and other styles

37

Do, Ngoc. "Použití rekurentních neuronových sítí pro automatické rozpoznávání řečníka, jazyka a pohlaví." Master's thesis, 2016. http://www.nusl.cz/ntk/nusl-346774.

Full text

Abstract:

Title: Neural networks for automatic speaker, language, and sex identifica- tion Author: Bich-Ngoc Do Department: Institute of Formal and Applied Linguistics Supervisor: Ing. Mgr. Filip Jurek, Ph.D., Institute of Formal and Applied Linguistics and Dr. Marco Wiering, Faculty of Mathematics and Natural Sciences, University of Groningen Abstract: Speaker recognition is a challenging task and has applications in many areas, such as access control or forensic science. On the other hand, in recent years, deep learning paradigm and its branch, deep neural networks have emerged as powerful machine learning techniques and achieved state-of- the-art in many fields of natural language processing and speech technology. Therefore, the aim of this work is to explore the capability of a deep neural network model, recurrent neural networks, in speaker recognition. Our pro- posed systems are evaluated on TIMIT corpus using speaker identification task. In comparison with other systems in the same test conditions, our systems could not surpass reference ones due to the sparsity of validation data. In general, our experiments show that the best system configuration is a combination of MFCCs with their dynamic features and a recurrent neural network model. We also experiment recurrent neural networks and convo- lutional neural...

APA, Harvard, Vancouver, ISO, and other styles

38

Fu, Yang. "Reconnaissance de l'émotion thermique." Thèse, 2017. http://hdl.handle.net/1866/19371.

Full text

Abstract:

Pour améliorer les interactions homme-ordinateur dans les domaines de la santé, de l'e-learning et des jeux vidéos, de nombreux chercheurs ont étudié la reconnaissance des émotions à partir des signaux de texte, de parole, d'expression faciale, de détection d'émotion ou d'électroencéphalographie (EEG). Parmi eux, la reconnaissance d'émotion à l'aide d'EEG a permis une précision satisfaisante. Cependant, le fait d'utiliser des dispositifs d'électroencéphalographie limite la gamme des mouvements de l'utilisateur. Une méthode non envahissante est donc nécessaire pour faciliter la détection des émotions et ses applications. C'est pourquoi nous avons proposé d'utiliser une caméra thermique pour capturer les changements de température de la peau, puis appliquer des algorithmes d'apprentissage machine pour classer les changements d'émotion en conséquence. Cette thèse contient deux études sur la détection d'émotion thermique avec la comparaison de la détection d'émotion basée sur EEG. L'un était de découvrir les profils de détection émotionnelle thermique en comparaison avec la technologie de détection d'émotion basée sur EEG; L'autre était de construire une application avec des algorithmes d'apprentissage en machine profonds pour visualiser la précision et la performance de la détection d'émotion thermique et basée sur EEG. Dans la première recherche, nous avons appliqué HMM dans la reconnaissance de l'émotion thermique, et après avoir comparé à la détection de l'émotion basée sur EEG, nous avons identifié les caractéristiques liées à l'émotion de la température de la peau en termes d'intensité et de rapidité. Dans la deuxième recherche, nous avons mis en place une application de détection d'émotion qui supporte à la fois la détection d'émotion thermique et la détection d'émotion basée sur EEG en appliquant les méthodes d'apprentissage par machine profondes - Réseau Neuronal Convolutif (CNN) et Mémoire à long court-terme (LSTM). La précision de la détection d'émotion basée sur l'image thermique a atteint 52,59% et la précision de la détection basée sur l'EEG a atteint 67,05%. Dans une autre étude, nous allons faire plus de recherches sur l'ajustement des algorithmes d'apprentissage machine pour améliorer la précision de détection d'émotion thermique.
To improve computer-human interactions in the areas of healthcare, e-learning and video games, many researchers have studied on recognizing emotions from text, speech, facial expressions, emotion detection, or electroencephalography (EEG) signals. Among them, emotion recognition using EEG has achieved satisfying accuracy. However, wearing electroencephalography devices limits the range of user movement, thus a noninvasive method is required to facilitate the emotion detection and its applications. That’s why we proposed using thermal camera to capture the skin temperature changes and then applying machine learning algorithms to classify emotion changes accordingly. This thesis contains two studies on thermal emotion detection with the comparison of EEG-base emotion detection. One was to find out the thermal emotional detection profiles comparing with EEG-based emotion detection technology; the other was to implement an application with deep machine learning algorithms to visually display both thermal and EEG based emotion detection accuracy and performance. In the first research, we applied HMM in thermal emotion recognition, and after comparing with EEG-base emotion detection, we identified skin temperature emotion-related features in terms of intensity and rapidity. In the second research, we implemented an emotion detection application supporting both thermal emotion detection and EEG-based emotion detection with applying the deep machine learning methods – Convolutional Neutral Network (CNN) and LSTM (Long- Short Term Memory). The accuracy of thermal image based emotion detection achieved 52.59% and the accuracy of EEG based detection achieved 67.05%. In further study, we will do more research on adjusting machine learning algorithms to improve the thermal emotion detection precision.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!