Dissertations / Theses on the topic 'LSTM-CNN'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 38 dissertations / theses for your research on the topic 'LSTM-CNN.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Gessle, Gabriel, and Simon Åkesson. "A comparative analysis of CNN and LSTM for music genre classification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260138.
Full textMusikindustrin har sett en stor ökning i antalet sätt att hitta och distribuera musik. Det kommer däremot med sina nackdelar, då mängden data ökar fort så blir det svårare att hantera den på ett bra sätt. Ljudfiler har mängder av information man kan extrahera och därmed göra den här processen enklare. Det är möjligt att använda sig av de olika typer av information som finns i filen, men bästa sättet att hantera dessa är inte alltid känt. Den här rapporten jämför två olika djupinlärningsmetoder, convolutional neural network (CNN) och long short-term memory (LSTM), tränade med mel-frequency cepstral coefficients (MFCCs) för klassificering av musikgenre i hopp om att göra ljuddata lättare att hantera inför framtida användning. Modellerna testades på två olika dataset, GTZAN och FMA, där resultaten visade att CNN:et fick en träffsäkerhet på 56.0% och 50.5% tränat på respektive dataset. Denna utpresterade LSTM modellen som istället uppnådde en träffsäkerhet på 42.0% och 33.5%.
Graffi, Giacomo. "A novel approach for Credit Scoring using Deep Neural Networks with bank transaction data." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Find full textOlin, Per. "Evaluation of text classification techniques for log file classification." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166641.
Full textSuresh, Sreerag. "An Analysis of Short-Term Load Forecasting on Residential Buildings Using Deep Learning Models." Thesis, Virginia Tech, 2020. http://hdl.handle.net/10919/99287.
Full textMaster of Science
Building energy load forecasting is becoming an increasingly important task with the rapid deployment of smart homes, integration of renewables into the grid and the advent of decentralized energy systems. Residential load forecasting has been a challenging task since residential load is highly stochastic. Deep learning models have showed tremendous promise in the fields of time-series and sequential data and have been successfully used in the field of short-term load forecasting. Although, other studies have looked at using deep learning models for building energy forecasting, most of those studies have looked at only a single home or an aggregate load of a collection of homes. This study aims to address this gap and serve as an analysis on short term load forecasting on 3 communities of residential buildings. Detailed analysis on the model performances across all homes have been studied. Deep learning models have been used in this study and their efficacy is measured compared to a simple ANN model.
Terefe, Adisu Wagaw. "Handwritten Recognition for Ethiopic (Ge’ez) Ancient Manuscript Documents." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-288145.
Full textDet handskrivna igenkännings systemet är en process för att lära sig ett mönster från en viss bild av text. Erkännande Processen kombinerar vanligtvis en datorvisionsuppgift med sekvens inlärningstekniker. Transkribering av texter från den skannade bilden är fortfarande ett utmanande problem, särskilt när dokumenten är mycket försämrad eller har för omåttlig dammiga buller. Nuförtiden finns det flera handskrivna igenkänningar system både kommersiellt och i gratisversionen, särskilt för latin baserade språk. Det finns dock ingen tidigare studie som har byggts för Ge’ez handskrivna gamla manuskript dokument. I motsats till detta språk har många mysterier från det förflutna, i vetenskapens mänskliga historia, arkitektur, medicin och astronomi. I denna avhandling presenterar vi två separata igenkänningssystem. (1) Ett karaktärs nivå igenkänningssystem som kombinerar bildigenkänning för karaktär segmentering från forntida böcker och ett vanilj Convolutional Neural Network (CNN) för att erkänna karaktärer. (2) Ett änd-till-slut-segmentering fritt handskrivet igenkänningssystem som använder CNN, Multi-Dimensional Recurrent Neural Network (MDRNN) med Connectionist Temporal Classification (CTC) för etiopiska (Ge’ez) manuskript dokument. Den föreslagna karaktär igenkännings modellen överträffar 97,78% noggrannhet. Däremot ger den andra modellen ett uppmuntrande resultat som indikerar att ytterligare studera språk egenskaperna för bättre igenkänning av alla antika böcker.
Rintala, Jonathan. "Speech Emotion Recognition from Raw Audio using Deep Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278858.
Full textTraditionellt sätt, vid talbaserad känsloigenkänning, kräver modeller ett stort antal manuellt konstruerade attribut och mellanliggande representationer, såsom spektrogram, för träning. Men att konstruera sådana attribut för hand kräver ofta både domänspecifika expertkunskaper och resurser. Nyligen har djupinlärningens framväxande end-to-end modeller, som utvinner attribut och lär sig direkt från den råa ljudsignalen, undersökts. Ett tidigare tillvägagångssätt har varit att kombinera parallella CNN:er med olika filterlängder för att extrahera flera temporala attribut från ljudsignalen och sedan låta den resulterande sekvensen passera vidare in i ett så kallat Recurrent Neural Network. Andra tidigare studier har också nått en hög noggrannhet när man använder lokala inlärningsblock (LFLB) för att reducera dimensionaliteten hos den råa ljudsignalen, och på så sätt extraheras den viktigaste informationen från ljudet. Således kombinerar denna studie idén om att nyttja LFLB:er för extraktion av attribut, tillsammans med ett block av parallella CNN:er som har olika filterlängder för att fånga multitemporala attribut; detta kommer slutligen att matas in i ett LSTM-lager för global inlärning av kontextuell information. Så vitt vi vet har en sådan kombinerad arkitektur ännu inte undersökts. Vidare kommer denna studie att undersöka olika konfigurationer av en sådan arkitektur. Den föreslagna modellen tränas och utvärderas sedan på de välkända taldatabaserna EmoDB och RAVDESS, både via ett talarberoende och talaroberoende tillvägagångssätt. Resultaten indikerar att den föreslagna arkitekturen kan ge jämförbara resultat med state-of-the-art, trots att ingen ökning av data eller avancerad förbehandling har inkluderats. Det rapporteras att 3 parallella CNN-lager gav högsta noggrannhet, tillsammans med en serie av modifierade LFLB:er som nyttjar average-pooling och ReLU som aktiveringsfunktion. Detta visar fördelarna med att lämna inlärningen av attribut till nätverket och öppnar upp för intressant framtida forskning kring tidskomplexitet och avvägning mellan introduktion av komplexitet i förbehandlingen eller i själva modellarkitekturen.
Kapoor, Prince. "Shoulder Keypoint-Detection from Object Detection." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/38015.
Full textEngström, Olof. "Deep Learning for Anomaly Detection in Microwave Links : Challenges and Impact on Weather Classification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-276676.
Full textArtificiell intelligens har fått mycket uppmärksamhet inom olika teknik- och vetenskapsområden på grund av dess många lovande tillämpningar. I dagens samhälle är väderklassificeringsmodeller med hög noggrannhet av yttersta vikt. Ett alternativ till att använda konventionell väderradar är att använda uppmätta dämpningsdata i mikrovågslänkar som indata till djupinlärningsbaserade väderklassificeringsmodeller. Detektering av avvikelser i uppmätta dämpningsdata är av stor betydelse eftersom en klassificeringsmodells pålitlighet minskar om träningsdatat innehåller avvikelser. Att utforma en noggrann klassificeringsmodell är svårt på grund av bristen på fördefinierade kännetecken för olika typer av väderförhållanden, och på grund av de specifika domänkrav som ofta ställs när det gäller exekveringstid och detekteringskänslighet. I det här examensarbetet undersöker vi förhållandet mellan avvikelser i uppmätta dämpningsdata från mikrovågslänkar, och felklassificeringar gjorda av en väderklassificeringsmodell. För detta ändamål utvärderar vi avvikelsedetektering inom ramen för väderklassificering med hjälp av två djupinlärningsmodeller, baserade på long short-term memory-nätverk (LSTM) och faltningsnätverk (CNN). Vi utvärderar genomförbarhet och generaliserbarhet av den föreslagna metodiken i en industriell fallstudie hos Ericsson AB. Resultaten visar att båda föreslagna metoder kan upptäcka avvikelser som korrelerar med felklassificeringar gjorda av väderklassificeringsmodellen. LSTM-modellen presterade bättre än CNN-modellen både med hänsyn till toppprestanda på en länk och med hänsyn till genomsnittlig prestanda över alla 5 testade länkar, men CNNmodellens prestanda var mer konsistent.
Chen, Yani. "Deep Learning based 3D Image Segmentation Methods and Applications." Ohio University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1547066297047003.
Full textLin, Alvin. "Video Based Automatic Speech Recognition Using Neural Networks." DigitalCommons@CalPoly, 2020. https://digitalcommons.calpoly.edu/theses/2343.
Full textLagerhjelm, Linus. "Extracting Information from Encrypted Data using Deep Neural Networks." Thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-155904.
Full textVolný, Miloš. "Využití umělé inteligence jako podpory pro rozhodování v podniku." Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2019. http://www.nusl.cz/ntk/nusl-399447.
Full textMazhar, Osama. "Vision-based human gestures recognition for human-robot interaction." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS044.
Full textIn the light of factories of the future, to ensure productive, safe and effective interaction between robot and human coworkers, it is imperative that the robot extracts the essential information of the coworker. To address this, deep learning solutions are explored and a reliable human gesture detection framework is developed in this work. Our framework is able to robustly detect static hand gestures plus upper-body dynamic gestures.For static hand gestures detection, openpose is integrated with Kinect V2 to obtain a pseudo-3D human skeleton. With the help of 10 volunteers, we recorded an image dataset opensign, that contains Kinect V2 RGB and depth images of 10 alpha-numeric static hand gestures taken from the American Sign Language. "Inception V3" neural network is adapted and trained to detect static hand gestures in real-time.Subsequently, we extend our gesture detection framework to recognize upper-body dynamic gestures. A spatial attention based dynamic gestures detection strategy is proposed that employs multi-modal "Convolutional Neural Network - Long Short-Term Memory" deep network to extract spatio-temporal dependencies in pure RGB video sequences. The exploited convolutional neural network blocks are pre-trained on our static hand gestures dataset opensign, which allow efficient extraction of hand features. Our spatial attention module focuses on large-scale movements of upper limbs plus on hand images for subtle hand/fingers movements, to efficiently distinguish gestures classes.This module additionally exploits 2D upper-body pose to estimate distance of user from the sensor for scale-normalization plus determine the parameters of hands bounding boxes without a need of depth sensor. The information typically extracted from a depth camera in similar strategies is learned from opensign dataset. Thus the proposed gestures recognition strategy can be implemented on any system with a monocular camera.Afterwards, we briefly explore 3D human pose estimation strategies for monocular cameras. To estimate 3D human pose, a hybrid strategy is proposed which combines the merits of discriminative 2D pose estimators with that of model based generative approaches. Our method optimizes an objective function, that minimizes the discrepancy between position & scale-normalized 2D pose obtained from openpose, and a virtual 2D projection of a kinematic human model.For real-time human-robot interaction, an asynchronous distributed system is developed to integrate our static hand gestures detector module with an open-source physical human-robot interaction library OpenPHRI. We validate performance of the proposed framework through a teach by demonstration experiment with a robotic manipulator
Shaif, Ayad. "Predictive Maintenance in Smart Agriculture Using Machine Learning : A Novel Algorithm for Drift Fault Detection in Hydroponic Sensors." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-42270.
Full textEvholt, David, and Oscar Larsson. "Generative Adversarial Networks and Natural Language Processing for Macroeconomic Forecasting." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273422.
Full textMakroekonomiska prognoser är sedan länge en svår utmaning. Idag löses de oftast med tidsserieanalys och få försök har gjorts med maskininlärning. I denna uppsats används ett generativt motstridande nätverk (GAN) för att förutspå amerikansk arbetslöshet, med resultat som slår samtliga riktmärken satta av en ARIMA. Ett försök görs också till att använda data från Twitter och den datorlingvistiska (NLP) modellen DistilBERT. Dessa modeller slår inte riktmärkena men visar lovande resultat. Modellerna testas vidare på det amerikanska börsindexet S&P 500. För dessa modeller förbättrade Twitterdata resultaten vilket visar på den potential data från sociala medier har när de appliceras på mer oregelbunda index, utan tydligt säsongsberoende och som är mer känsliga för trender i det offentliga samtalet. Resultaten visar på att Twitterdata kan användas för att hitta trender i både amerikansk arbetslöshet och S&P 500 indexet. Detta lägger grunden för fortsatt forskning inom NLP-GAN modeller för makroekonomiska prognoser baserade på data från sociala medier.
Holm, Noah, and Emil Plynning. "Spatio-temporal prediction of residential burglaries using convolutional LSTM neural networks." Thesis, KTH, Geoinformatik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229952.
Full textNäslund, Per. "Artificial Neural Networks in Swedish Speech Synthesis." Thesis, KTH, Tal-kommunikation, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-239350.
Full textTalsynteser, också kallat TTS (text-to-speech) används i stor utsträckning inom smarta assistenter och många andra applikationer. Samtida forskning applicerar maskininlärning och artificiella neurala nätverk (ANN) för att utföra talsyntes. Det har visats i studier att dessa system presterar bättre än de äldre konkatenativa och parametriska metoderna. I den här rapporten utforskas ANN-baserade TTS-metoder och en av metoderna implementeras för det svenska språket. Den använda metoden kallas “Tacotron” och är ett första steg mot end-to-end TTS baserat på neurala nätverk. Metoden binder samman flertalet olika ANN-tekniker. Det resulterande systemet jämförs med en parametriskt TTS genom ett graderat preferens-test som innefattar 20 svensktalande försökspersoner. En statistiskt säkerställd preferens för det ANN- baserade TTS-systemet fastställs. Försökspersonerna indikerar att det ANN-baserade TTS-systemet presterar bättre än det parametriska när det kommer till ljudkvalitet och naturlighet men visar brister inom tydlighet.
Ďuriš, Denis. "Detekce ohně a kouře z obrazového signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-412968.
Full textBroomé, Sofia. "Objectively recognizing human activity in body-worn sensor data with (more or less) deep neural networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210243.
Full textInom ramen för uppsatsen testas hur väl rörelsemönster kan urskiljas ur accelerometerdatamed hjälp av den gren av maskininlärning som kallas djupinlärning; där djupa artificiellaneurala nätverk av noder funktionsapproximerar mappandes från domänen av sensordatatill olika fördefinerade kategorier av aktiviteter så som gång, stående, sittande eller liggande.Det finns ett intresse från den medicinska sidan att kunna mäta fysisk aktivitet objektivt,bland annat eftersom det visats att det finns en korrelation mellan ökade hälsorisker hosbarn och deras mängd daglig skärmtid. Denna typ av mätningar ska helst kunna göras medicke-invasiv utrustning till låg kostnad för att kunna göra större studier.Enklare nätverksarkitekturer samt återimplementeringar av bästa möjliga teknik inomområdet Mänsklig aktivitetsigenkänning (HAR) testas både på ett benchmarkingdataset ochpå egeninhämtad data i samarbete med Institutet för Folkhälsovetenskap på Karolinska Institutetoch resultat redovisas för olika val av möjliga klassificeringar och olika antal dimensionerper mätpunkt. De uppnådda resultaten (95% F1-score) på ett 4- och 5-klass-problem ärjämförbara med de bästa tidigare publicerade resultaten för aktivitetsigenkänning, vilket äranmärkningsvärt då då betydligt färre accelerometrar har använts här än i de åsyftade studierna.Förutom klassificeringsresultaten som redovisas bidrar det här arbetet med ett nyttinhämtat och kategorimärkt dataset; KTH-KI-AA. Det är jämförbart i antal datapunkter medspridda benchmarkingdataset inom HAR-området.
Gopchandani, Sandhya. "Using Word Embeddings to Explore the Language of Depression on Twitter." ScholarWorks @ UVM, 2019. https://scholarworks.uvm.edu/graddis/1072.
Full textMukhedkar, Dhananjay. "Polyphonic Music Instrument Detection on Weakly Labelled Data using Sequence Learning Models." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279060.
Full textPolyfonisk eller multipel musikinstrumentdetektering är ett svårt problem jämfört med att detektera enstaka eller soloinstrument i en ljudinspelning. Eftersom musik är tidsseriedata kan den modelleras med hjälp av sekvensinlärningsmetoder inom djup inlärning. Nyligen har ’Temporal Convolutional Network’ (TCN) visat sig överträffa konventionella ’Recurrent Neural Network’ (RNN) på flertalet sekvensmodelleringsuppgifter. Även om det har skett betydande förbättringar i metoder för djup inlärning, blir dataknapphet ett problem vid utbildning av storskaliga modeller. Svagt märkta data är ett alternativ där ett klipp kommenteras för närvaro av frånvaro av instrument utan att ange de tidpunkter då ett instrument låter. Denna studie undersöker hur TCN-modellen jämförs med en ’Long Short-Term Memory’ (LSTM) -modell medan den tränas i svagt märkta datasätt. Resultaten visade framgångsrik utbildning av båda modellerna tillsammans med generalisering i en separat datasats. Jämförelsen visade att TCN presterade bättre än LSTM, men endast marginellt. Därför kan man från de genomförda experimenten inte uttryckligen dra slutsatsen om TCN övertygande är ett bättre val jämfört med LSTM i samband med instrumentdetektering, men definitivt ett starkt alternativ.
Johansson, Alexander, and Oscar Sandberg. "A COMPARATIVE STUDY OF DEEP-LEARNING APPROACHES FOR ACTIVITY RECOGNITION USING SENSOR DATA IN SMART OFFICE ENVIRONMENTS." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20928.
Full textThe purpose of the study is to compare three deep learning networks with each other to evaluate which network can produce the highest prediction accuracy. Accuracy is measured as the networks try to predict the number of people in the room where observation takes place. In addition to comparing the three deep learning networks with each other, we also compare the networks with a traditional machine learning approach - in order to find out if deep learning methods perform better than traditional methods do. This study uses design and creation. Design and creation is a methodology that places great emphasis on developing an IT product and uses the product as its contribution to new knowledge. The methodology has five different phases; we choose to make an iterative process between the development and evaluation phases. Observation is the data generation method used to collect data. Data generation lasted for three weeks, resulting in 31287 rows of data recorded in our database. One of our deep learning networks produced an accuracy of 78.2% meanwhile, the two other approaches produced an accuracy of 45.6% and 40.3% respectively. For our traditional method decision trees were used, we used two different formulas and they produced an accuracy of 61.3% and 57.2% respectively. The result of this thesis shows that out of the three deep learning networks included in this study, only one deep learning network is able to produce a higher predictive accuracy than the traditional ML approaches. This result does not necessarily mean that deep learning approaches in general, are able to produce a higher predictive accuracy than traditional machine learning approaches. Further work that can be made is the following: further experimentation with the dataset and hyperparameters, gather more data and properly validate this data and compare more and other deep learning and machine learning approaches.
Hedar, Sara. "Applying Machine Learning Methods to Predict the Outcome of Shots in Football." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-414774.
Full textHamerník, Pavel. "Využití hlubokého učení pro rozpoznání textu v obrazu grafického uživatelského rozhraní." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403823.
Full textKvita, Jakub. "Popis fotografií pomocí rekurentních neuronových sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255324.
Full textKramář, Denis. "Analýza zvukových nahrávek pomocí hlubokého učení." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2021. http://www.nusl.cz/ntk/nusl-442571.
Full textAlbert, Florea George, and Filip Weilid. "Deep Learning Models for Human Activity Recognition." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20201.
Full textThe Augmented Multi-party Interaction(AMI) Meeting Corpus database is used to investigate group activity recognition in an office environment. The AMI Meeting Corpus database provides researchers with remote controlled meetings and natural meetings in an office environment; meeting scenario in a four person sized office room. To achieve the group activity recognition video frames and 2-dimensional audio spectrograms were extracted from the AMI database. The video frames were RGB colored images and audio spectrograms had one color channel. The video frames were produced in batches so that temporal features could be evaluated together with the audio spectrogrames. It has been shown that including temporal features both during model training and then predicting the behavior of an activity increases the validation accuracy compared to models that only use spatial features [1]. Deep learning architectures have been implemented to recognize different human activities in the AMI office environment using the extracted data from the AMI database.The Neural Network models were built using the Keras API together with TensorFlow library. There are different types of Neural Network architectures. The architecture types that were investigated in this project were Residual Neural Network, Visual Geometry Group 16, Inception V3 and RCNN(Recurrent Neural Network). ImageNet weights have been used to initialize the weights for the Neural Network base models. ImageNet weights were provided by Keras API and was optimized for each base model[2]. The base models uses ImageNet weights when extracting features from the input data.The feature extraction using ImageNet weights or random weights together with the base models showed promising results. Both the Deep Learning using dense layers and the LSTM spatio-temporal sequence prediction were implemented successfully.
Hsu, Tsu-Jui, and 許祖瑞. "Programmable CNN LSTM ASIC Design for Biomedical Application." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/x35asn.
Full text國立交通大學
電子研究所
107
Mobile health device is key factor for personal health care. Mobile health device can analyze user's physical well-being instantly. Mobile health device combines Artificial Intelligence, Big Data, Internet of Things, sensors, etc. It plays important role in personal health care. Unlike cloud computing, mobile health device has very limited computing resources and computing power, mobile health device needs to achieve edge computing. Edge computing requires low power and real-time computing. To achieve low power and real-time computing, we design an ASIC that can process multiple deep learning networks. Supported deep learning networks includes Convolutional Neural Network, Long Short Term Memory and Fully Connect. We also make the ASIC programmable, so that our ASIC can support different layers, kernel sizes, channel sizes for CNN and LSTM. Our ASIC achieve low power by sharing same PE among all three networks, and the main buffers used by LSTM is fully shared with FC. Under the real-time processing constrain, our ASIC can achieve 2.56 uW dynamic power and 224 uW static power, the total power is only 226.56 uW, since our ASIC is not running very fast, the clock frequency is only 3MHz, so most of power consumption is from static power. Our ASIC mainly processes PPG signal, and main application is Biometric Identification, Signal Selector, and Blood Glucose Predictor. The first two application utilize LSTM and FC networks. Blood Glucose Predictor utilize CNN, LSTM and FC. By combining these three networks, we can offer more stable and secure personal health care.
LIN, YOU-YING, and 林佑穎. "Integration of CNN and LSTM for abnormal behavior detection." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/mamd39.
Full text國立雲林科技大學
資訊工程系
107
In recent years, violent incident has often been reported by the news, which caused people to begin to pay attention to the issue of public safety. As a result, the monitor is starting to become important. However, the traditional monitor only has the function of recording, storing and playing the video, which is only used to record the occurrence of events. If an abnormal event occurs, the traditional monitor cannot have a warning effect. This study uses the video of a monitor to automatically identify the moment when human behavior is abnormal, it will immediately send the notices to the security personnel, so that countermeasures can be taken more quickly to improve the security level. Therefore, this thesis proposes an abnormal behavior detection model based on deep learning, which applying object detection technology to pedestrian detection., and track the detected pedestrians continuously, and then uses convolutional neural networks to extract the action characteristics of each tracking trajectory, in order to predict abnormal behavior(fall, kick, punch) through Long Short-Term Memory Network. The experimental results show that the proposed method has a good recognition effect in both the Fall Detection Dataset and the UT-interaction dataset, and it can meet the real-time detection requirements in real-world scenarios. The accuracy rate can reach 83.31%.
Coelho, Jorge Andre de Carvalho, and 卡橋安. "Music Structural Segmentation from Audio Signals using CNN Bidirectional LSTM." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/m2j6q8.
Full text國立清華大學
資訊系統與應用研究所
107
In this paper, we investigate the problems of segmenting a piece of music into its structural components from its audio signals. We devise a deep learning neural network architecture called CNN Bidirectional LSTM model which combines convolutional neural networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) to perform music boundary detection. The music audio input to the model is first converted into one spectrogram and two SSMs that can be classified by the deep neural network. We also propose the use of Chroma Energy Normalized Statistics on this task. We show the resulting improvements over previous work with respect to precision and recall. We verified improvement of 11.2\% and 6.58\% F1-score at $ m0.5$ seconds and $ m3$ seconds tolerance, respectively.
HUANG, GANG-CHENG, and 黃綱正. "Design of Malware Classification Method Combined with CNN and LSTM model." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/s4j874.
Full text國立臺北科技大學
自動化科技研究所
108
This thesis presented the malware classification by using deep learning models. During the last decade, this had been implemented not only by machine learning (e.g., SVM, decision tree, etc.) but also by convolution neural networks (CNNs), and recurrent neural networks (RNNs). Many studies had experimented that using deep learning models had much higher accuracy than machine learning models. This proposed algorithm with “Malimg” which was one of the computer worm’s dataset had achieved an improvement from the accuracy from 84.92% to 87.79% by using the combination of CNN and LSTM models.
Su, Ruei-Ye, and 蘇瑞燁. "A Bi-directional LSTM-CNN Model with Attention for Chinese Sentiment Analysis." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/2y9j7r.
Full text樹德科技大學
資訊工程系碩士班
107
With the massive development of social media, people are used to sharing personal ideas and opinions on social media service platforms and most people have personal viewpoints on certain specific topics. As time goes on, large amounts of data are generated, which contain potentially valuable information from the perspective of business. In the field of NLP (Natural Language Processing), sentiment analysis in Chinese messages is one of the major approaches to grasping Internet public opinion. This paper originally proposed a LSAEB-CNN (Bi-LSTM Self-Attention of Emoticon-Based Convolutional Neural Network), which is a deep learning method that combines Bi-directional Long Short-Term Memory (Bi-LSTM) with Convolutional Neural Networks (CNN), and embeds emoticons into Self-Attention. The method could effectively identify different emotional polarities without external knowledge, but the focus in Self-Attention excessive attention to problems. This paper thus proposes a further improved method: Bi-LSTM Multi-Head Attention of Emoticon-Based Convolutional Neural Network (LMAEB-CNN) on Self-Attention. Most importantly, the method lets each vector perform multi-layer operations. The data was collected from Plurk, the micro-blogging service, with deep learning conducted in Keras. Chinese micro-blogs were checked for sentiment polarity classification and the study achieved an accuracy rate of about 98.9%, which is significantly higher than other methods.
宋恩喆. "An Agricultural Irrigation System with Soil Moisture Prediction Using Hybrid LSTM-CNN Learning for LoRaWAN Networks." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/ax6wy8.
Full text國立臺北大學
資訊工程學系
107
The production of agriculture is affected by many environmental factors, and irrigation water is also one of the factors. Soil moisture is an important part of crop irrigation decision-making and can be used as a reference for crops to be irrigated. Predicting future changes in soil moisture values can serve as a reference for irrigation decision making, thereby increasing crop growth. In this paper, we present a LoRaWAN agricultural irrigation system with soil moisture prediction using hybrid LSTM-CNN Learning. The proposed model has two channels, LSTM and CNN, which learn the long-term dependence and local features of the data, respectively, and combine the results of the two channels to produce the final soil moisture prediction results. For the training data, using environmental sensor data deployed in a farm greenhouse that connects with LoRaWAN and transmits the data to a cloud application server. Another source of training data is weather data from the Central Weather Bureau for forecasting. We use these data to establish the proposed hybrid LSTM-CNN prediction model. The experimental results show that the proposed hybrid LSTM-CNN prediction model can achieve the expected results and predict the soil moisture value for the next hour.
Hsu, Ya-Ling, and 徐雅玲. "Toward Automatic Pain-Level Detection for Emergency Patients using Fusion of CNN and LSTM Multimodal Audio-Video Features." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/kr4cgk.
Full text國立清華大學
電機工程學系所
106
Nowadays, emergency department are often considered as the most efficient ways to seek medical care. However, to allocate the healthcare resource effectively, triage classification system plays an important role in assessing the severity of illness of the boarding patient at emergency department. There are some factors listed in Taiwan triage and acuity scale (TTAS) about triage classification system. And the self-report pain intensity numerical-rating scale (NRS) is one of the major modifiers of the current triage system based on the TTAS. In clinical practice, physicians and nurses have noticed the difficulty in the systematic implementation of this instrument especially for elderly people, foreigners, or patients with a low education level. This often leads to the triage nurses would select the level through his/her own observations instead of soliciting an answer from the patient. These ways would create a deviation on the consistency and validity of the triage classification system. In this paper, we have cooperation with emergency physicians in Linkou Chang Gung Memorial Hospital. We extract the multimodal behavioral signal of facial expression and vocal characteristics from patients, and model these behaviors by using machine learning models of CNN and LSTM respectively. The experimental results show that the accuracy of 77.1% and 55.7%, respectively, in the two and three classes of pain recognition. Further, in the experimental analysis, we also found that it had significant relationship with facial expression and vocal characteristics of patients.
Zhou, Quan. "Bidirectional long short-term memory network for proto-object representation." Thesis, 2018. https://hdl.handle.net/2144/31682.
Full textRaptis, Konstantinos. "The clash between two worlds in human action recognition: supervised feature training vs Recurrent ConvNet." Thesis, 2016. https://doi.org/10.7912/C2CW7G.
Full textAction recognition has been an active research topic for over three decades. There are various applications of action recognition, such as surveillance, human-computer interaction, and content-based retrieval. Recently, research focuses on movies, web videos, and TV shows datasets. The nature of these datasets make action recognition very challenging due to scene variability and complexity, namely background clutter, occlusions, viewpoint changes, fast irregular motion, and large spatio-temporal search space (articulation configurations and motions). The use of local space and time image features shows promising results, avoiding the cumbersome and often inaccurate frame-by-frame segmentation (boundary estimation). We focus on two state of the art methods for the action classification problem: dense trajectories and recurrent neural networks (RNN). Dense trajectories use typical supervised training (e.g., with Support Vector Machines) of features such as 3D-SIFT, extended SURF, HOG3D, and local trinary patterns; the main idea is to densely sample these features in each frame and track them in the sequence based on optical flow. On the other hand, the deep neural network uses the input frames to detect action and produce part proposals, i.e., estimate information on body parts (shapes and locations). We compare qualitatively and numerically these two approaches, indicative to what is used today, and describe our conclusions with respect to accuracy and efficiency.
Do, Ngoc. "Použití rekurentních neuronových sítí pro automatické rozpoznávání řečníka, jazyka a pohlaví." Master's thesis, 2016. http://www.nusl.cz/ntk/nusl-346774.
Full textFu, Yang. "Reconnaissance de l'émotion thermique." Thèse, 2017. http://hdl.handle.net/1866/19371.
Full textTo improve computer-human interactions in the areas of healthcare, e-learning and video games, many researchers have studied on recognizing emotions from text, speech, facial expressions, emotion detection, or electroencephalography (EEG) signals. Among them, emotion recognition using EEG has achieved satisfying accuracy. However, wearing electroencephalography devices limits the range of user movement, thus a noninvasive method is required to facilitate the emotion detection and its applications. That’s why we proposed using thermal camera to capture the skin temperature changes and then applying machine learning algorithms to classify emotion changes accordingly. This thesis contains two studies on thermal emotion detection with the comparison of EEG-base emotion detection. One was to find out the thermal emotional detection profiles comparing with EEG-based emotion detection technology; the other was to implement an application with deep machine learning algorithms to visually display both thermal and EEG based emotion detection accuracy and performance. In the first research, we applied HMM in thermal emotion recognition, and after comparing with EEG-base emotion detection, we identified skin temperature emotion-related features in terms of intensity and rapidity. In the second research, we implemented an emotion detection application supporting both thermal emotion detection and EEG-based emotion detection with applying the deep machine learning methods – Convolutional Neutral Network (CNN) and LSTM (Long- Short Term Memory). The accuracy of thermal image based emotion detection achieved 52.59% and the accuracy of EEG based detection achieved 67.05%. In further study, we will do more research on adjusting machine learning algorithms to improve the thermal emotion detection precision.