Log in

Relevant bibliographies by topics / Encoder-Decoder LSTM / Journal articles

To see the other types of publications on this topic, follow the link: Encoder-Decoder LSTM.

Journal articles on the topic 'Encoder-Decoder LSTM'

Author: Grafiati

Published: 10 January 2023

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Encoder-Decoder LSTM.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Li, Mingfei, Jiajian Wu, Zhengpeng Chen, Jiangbo Dong, Zhiping Peng, Kai Xiong, Mumin Rao, Chuangting Chen, and Xi Li. "Data-Driven Voltage Prognostic for Solid Oxide Fuel Cell System Based on Deep Learning." Energies 15, no. 17 (August 29, 2022): 6294. http://dx.doi.org/10.3390/en15176294.

Full text

Abstract:

A solid oxide fuel cell (SOFC) is an innovative power generation system that is green, efficient, and promising for a wide range of applications. The prediction and evaluation of the operation state of a solid oxide fuel cell system is of great significance for the stable and long-term operation of the power generation system. Prognostics and Health Management (PHM) technology is widely used to perform preventive and predictive maintenance on equipment. Unlike prediction based on the SOFC mechanistic model, the combination of PHM and deep learning has shown wide application prospects. Therefore, this study first obtains an experimental dataset through short-term degradation experiments of a 1 kW SOFC system, and then proposes an encoder-decoder RNN-based SOFC state prediction model. Based on the experimental dataset, the model can accurately predict the voltage variation of the SOFC system. The prediction results of the four different prediction models developed are compared and analyzed, namely, long short-term memory (LSTM), gated recurrent unit (GRU), encoder–decoder LSTM, and encoder–decoder GRU. The results show that for the SOFC test set, the mean square error of encoder–decoder LSTM and encoder–decoder GRU are 0.015121 and 0.014966, respectively, whereas the corresponding error results of LSTM and GRU are 0.017050 and 0.017456, respectively. The encoder–decoder RNN model displays high prediction precision, which proves that it can improve the accuracy of prediction, which is expected to be combined with control strategies and further help the implementation of PHM in fuel cells.

APA, Harvard, Vancouver, ISO, and other styles

2

Subramanian, Sowkarthika, Yasoda Kailasa Gounder, and Sumathi Lingana. "Day-ahead solar irradiance forecast using sequence-to-sequence model with attention mechanism." Indonesian Journal of Electrical Engineering and Computer Science 25, no. 2 (February 1, 2022): 900. http://dx.doi.org/10.11591/ijeecs.v25.i2.pp900-909.

Full text

Abstract:

<p>The increasing integration of distributed energy resources (DERs) into power grid makes it significant to forecast solar irradiance for power system planning. With the advent of deep learning techniques, it is possible to forecast solar irradiance accurately for a longer time. In this paper, day-ahead solar irradiance is forecasted using encoder-decoder sequence-to-sequence models with attention mechanism. This study formulates the problem as structured multivariate forecasting and comprehensive experiments are made with the data collected from National Solar Radiation Database (NSRDB). Two error metrics are adopted to measure the errors of encoder-decoder sequence-to-sequence model and compared with smart persistence (SP), back propagation neural network (BPNN), recurrent neural network (RNN), long short term memory (LSTM) and encoder-decoder sequence-to-sequence LSTM with attention mechanism (Enc-Dec-LSTM). Compared with SP, BPNN and RNN, Enc-Dec-LSTM is more accurate and has reduced forecast error of 31.1%, 19.3% and 8.5% respectively for day-ahead solar irradiance forecast with 31.07% as forecast skill.</p>

APA, Harvard, Vancouver, ISO, and other styles

3

Yolchuyeva, Sevinj, Géza Németh, and Bálint Gyires-Tóth. "Grapheme-to-Phoneme Conversion with Convolutional Neural Networks." Applied Sciences 9, no. 6 (March 18, 2019): 1143. http://dx.doi.org/10.3390/app9061143.

Full text

Abstract:

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections and, furthermore, a model that utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network-based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate.

APA, Harvard, Vancouver, ISO, and other styles

4

Zhou, Shengwen, Shunsheng Guo, Baigang Du, Shuo Huang, and Jun Guo. "A Hybrid Framework for Multivariate Time Series Forecasting of Daily Urban Water Demand Using Attention-Based Convolutional Neural Network and Long Short-Term Memory Network." Sustainability 14, no. 17 (September 5, 2022): 11086. http://dx.doi.org/10.3390/su141711086.

Full text

Abstract:

Urban water demand forecasting is beneficial for reducing the waste of water resources and enhancing environmental protection in sustainable water management. However, it is a challenging task to accurately predict water demand affected by a range of factors with nonlinear and uncertainty temporal patterns. This paper proposes a new hybrid framework for urban daily water demand with multiple variables, called the attention-based CNN-LSTM model, which combines convolutional neural network (CNN), long short-term memory (LSTM), attention mechanism (AM), and encoder-decoder network. CNN layers are used to learn the representation and correlation between multivariate variables. LSTM layers are utilized as the building blocks of the encoder-decoder network to capture temporal characteristics from the input sequence, while AM is introduced to the encoder-decoder network to assign corresponding attention according to the importance of water demand multivariable time series at different times. The new hybrid framework considers correlation between multiple variables and neglects irrelevant data points, which helps to improve the prediction accuracy of multivariable time series. The proposed model is contrasted with the LSTM model, the CNN-LSTM model, and the attention-based LSTM to predict the daily water demand time series in Suzhou, China. The results show that the hybrid model achieves higher prediction performance with the smallest mean absolute error (MAE), root mean squared error (RMSE), and mean absolute percentage error (MAPE), and largest correlation coefficient (R2).

APA, Harvard, Vancouver, ISO, and other styles

5

Geng, Yaogang, Hongyan Mei, Xiaorong Xue, and Xing Zhang. "Image-Caption Model Based on Fusion Feature." Applied Sciences 12, no. 19 (September 30, 2022): 9861. http://dx.doi.org/10.3390/app12199861.

Full text

Abstract:

The encoder–decoder framework is the main frame of image captioning. The convolutional neural network (CNN) is usually used to extract grid-level features of the image, and the graph convolutional neural network (GCN) is used to extract the image’s region-level features. Grid-level features are poor in semantic information, such as the relationship and location of objects, while regional features lack fine-grained information about images. To address this problem, this paper proposes a fusion-features-based image-captioning model, which includes the fusion feature encoder and LSTM decoder. The fusion-feature encoder is divided into grid-level feature encoder and region-level feature encoder. The grid-level feature encoder is a convoluted neural network embedded in squeeze and excitation operations so that the model can focus on features that are highly correlated to the title. The region-level encoder employs node-embedding matrices to enable models to understand different node types and gain richer semantics. Then the features are weighted together by an attention mechanism to guide the decoder LSTM to generate an image caption. Our model was trained and tested in the MS COCO2014 dataset with the experimental evaluation standard Bleu-4 score and CIDEr score of 0.399 and 1.311, respectively. The experimental results indicate that the model can describe the image in detail.

APA, Harvard, Vancouver, ISO, and other styles

6

Zhang, Wei, Shangmin Luan, and Liqin Tian. "A Rapid Combined Model for Automatic Generating Web UI Codes." Wireless Communications and Mobile Computing 2022 (February 8, 2022): 1–10. http://dx.doi.org/10.1155/2022/4415479.

Full text

Abstract:

Encoder-Decoder network is usually applied to image caption to automatically generate descriptive text for a picture. Web user interface (Web UI) is a special type of image and is usually described by HTML (hypertext marked language). Consequently, it becomes possible to use the encoder-decoder network to generate the corresponding code from a screenshot of Web UI. The basic structure of the decoder is RNN, LSTM, GRU, or other recurrent neural networks. However, this kind of decoder needs a long training time, so it increases the time complexity of training and prediction. The HTML language is a typically structured language to describe the Web UI, but it is hard to express the timing characteristics of the word sequence and the complex context. To resolve these problems efficiently, a rapid combined model RCM (rapid combined model) is designed in this paper. The basic structure of the RCM is an encoder-decoder network. The word embedding matrix and visual model are included in the encoder. The word embedding matrix uses fully connected units. Compared with LSTM, the accuracy of the word embedding matrix is basically unchanged, but the training and prediction speed have been significantly improved. In the visual model, the pretrained InceptionV3 network is used to generate the image vector, which not only improves the quality of the recognition of the Web UI interface image but also reduces the training time of the RCM significantly. In the decoder, the word embedding vector and the image vector are integrated together and input into the prediction model for word prediction.

APA, Harvard, Vancouver, ISO, and other styles

7

Luo, Tao, Xudong Cao, Jin Li, Kun Dong, Rui Zhang, and Xueliang Wei. "Multi-task prediction model based on ConvLSTM and encoder-decoder." Intelligent Data Analysis 25, no. 2 (March 4, 2021): 359–82. http://dx.doi.org/10.3233/ida-194969.

Full text

Abstract:

The energy load data in the micro-energy network are a time series with sequential and nonlinear characteristics. This paper proposes a model based on the encode-decode architecture and ConvLSTM for multi-scale prediction of multi-energy loads in the micro-energy network. We apply ConvLSTM, LSTM, attention mechanism and multi-task learning concepts to construct a model specifically for processing the energy load forecasting of the micro-energy network. In this paper, ConvLSTM is used to encode the input time series. The attention mechanism is used to assign different weights to the features, which are subsequently decoded by the decoder LSTM layer. Finally, the fully connected layer interprets the output. This model is applied to forecast the multi-energy load data of the micro-energy network in a certain area of Northwest China. The test results prove that our model is convergent, and the evaluation index value of the model is better than that of the multi-task FC-LSTM and the single-task FC-LSTM. In particular, the application of the attention mechanism makes the model converge faster and with higher precision.

APA, Harvard, Vancouver, ISO, and other styles

8

Thakare, Abhijeet Ramesh, and Preeti Voditel. "Extractive Text Summarization Using LSTM-Based Encoder-Decoder Classification." ECS Transactions 107, no. 1 (April 24, 2022): 11665–72. http://dx.doi.org/10.1149/10701.11665ecst.

Full text

Abstract:

Nowadays, text summarization is one of the important areas to be focused on. As the World Wide Web is growing, a huge amount of text articles (especially blogs, scientific articles) are also generated on the internet. Automatic text summarization is one of the important techniques to shorten the original text in such a way that shorten or summarized text covers incisive and meaningful sentences of original huge text. Extractive summarization extracts important sentences from original documents and then aggregates all these sentences to generate the summary. We have proposed a novel LSTM based encoder-decoder, which plays a vital role in the extractive text summarization process. CNN news article dataset is utilized for training our model. Our model is evaluated on standard metrics like Gold Standard, Recall Oriented Understudy for Gisting Evaluation (ROUGHE)-1, and ROUGHE-2. After evaluation, our model achieved an average F1-Score of 0.8353. Our model also outperformed other models available in the literature.

APA, Harvard, Vancouver, ISO, and other styles

9

Oveisi, Shahrzad, Ali Moeini, and sayeh Mirzaei. "LSTM Encoder-Decoder Dropout Model in Software Reliability Prediction." International Journal of Reliability, Risk and Safety: Theory and Application 4, no. 2 (December 1, 2021): 1–12. http://dx.doi.org/10.30699/ijrrs.4.2.1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Kapočiūtė-Dzikienė, Jurgita. "A Domain-Specific Generative Chatbot Trained from Little Data." Applied Sciences 10, no. 7 (March 25, 2020): 2221. http://dx.doi.org/10.3390/app10072221.

Full text

Abstract:

Accurate generative chatbots are usually trained on large datasets of question–answer pairs. Despite such datasets not existing for some languages, it does not reduce the need for companies to have chatbot technology in their websites. However, companies usually own small domain-specific datasets (at least in the form of an FAQ) about their products, services, or used technologies. In this research, we seek effective solutions to create generative seq2seq-based chatbots from very small data. Since experiments are carried out in English and morphologically complex Lithuanian languages, we have an opportunity to compare results for languages with very different characteristics. We experimentally explore three encoder–decoder LSTM-based approaches (simple LSTM, stacked LSTM, and BiLSTM), three word embedding types (one-hot encoding, fastText, and BERT embeddings), and five encoder–decoder architectures based on different encoder and decoder vectorization units. Furthermore, all offered approaches are applied to the pre-processed datasets with removed and separated punctuation. The experimental investigation revealed the advantages of the stacked LSTM and BiLSTM encoder architectures and BERT embedding vectorization (especially for the encoder). The best achieved BLUE on English/Lithuanian datasets with removed and separated punctuation was ~0.513/~0.505 and ~0.488/~0.439, respectively. Better results were achieved with the English language, because generating different inflection forms for the morphologically complex Lithuanian is a harder task. The BLUE scores fell into the range defining the quality of the generated answers as good or very good for both languages. This research was performed with very small datasets having little variety in covered topics, which makes this research not only more difficult, but also more interesting. Moreover, to our knowledge, it is the first attempt to train generative chatbots for a morphologically complex language.

APA, Harvard, Vancouver, ISO, and other styles

11

Lu, Yao, Linqing Liu, Zhile Jiang, Min Yang, and Randy Goebel. "A Multi-Task Learning Framework for Abstractive Text Summarization." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 9987–88. http://dx.doi.org/10.1609/aaai.v33i01.33019987.

Full text

Abstract:

We propose a Multi-task learning approach for Abstractive Text Summarization (MATS), motivated by the fact that humans have no difficulty performing such task because they have the capabilities of multiple domains. Specifically, MATS consists of three components: (i) a text categorization model that learns rich category-specific text representations using a bi-LSTM encoder; (ii) a syntax labeling model that learns to improve the syntax-aware LSTM decoder; and (iii) an abstractive text summarization model that shares its encoder and decoder with the text categorization and the syntax labeling tasks, respectively. In particular, the abstractive text summarization model enjoys significant benefit from the additional text categorization and syntax knowledge. Our experimental results show that MATS outperforms the competitors.1

APA, Harvard, Vancouver, ISO, and other styles

12

Prakash, Kolla Bhanu. "Chatterbot implementation using Transfer Learning and LSTM Encoder-Decoder Architecture." International Journal of Emerging Trends in Engineering Research 8, no. 5 (May 25, 2020): 1709–15. http://dx.doi.org/10.30534/ijeter/2020/35852020.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Chen, Kai, Xiao Song, Daolin Han, Jinghan Sun, Yong Cui, and Xiaoxiang Ren. "Pedestrian behavior prediction model with a convolutional LSTM encoder–decoder." Physica A: Statistical Mechanics and its Applications 560 (December 2020): 125132. http://dx.doi.org/10.1016/j.physa.2020.125132.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Zhao, Yun, Xiuguo Zhang, Zijing Shang, and Zhiying Cao. "DA-LSTM-VAE: Dual-Stage Attention-Based LSTM-VAE for KPI Anomaly Detection." Entropy 24, no. 11 (November 5, 2022): 1613. http://dx.doi.org/10.3390/e24111613.

Full text

Abstract:

To ensure the normal operation of the system, the enterprise’s operations engineer will monitor the system through the KPI (key performance indicator). For example, web page visits, server memory utilization, etc. KPI anomaly detection is a core technology, which is of great significance for rapid fault detection and repair. This paper proposes a novel dual-stage attention-based LSTM-VAE (DA-LSTM-VAE) model for KPI anomaly detection. Firstly, in order to capture time correlation in KPI data, long–short-term memory (LSTM) units are used to replace traditional neurons in the variational autoencoder (VAE). Then, in order to improve the effect of KPI anomaly detection, an attention mechanism is introduced into the input stage of the encoder and decoder, respectively. During the input stage of the encoder, a time attention mechanism is adopted to assign different weights to different time points, which can adaptively select important input sequences to avoid the influence of noise in the data. During the input stage of the decoder, a feature attention mechanism is adopted to adaptively select important latent variable representations, which can capture the long-term dependence of time series better. In addition, this paper proposes an adaptive threshold method based on anomaly scores measured by reconstruction probability, which can minimize false positives and false negatives and avoid adjustment of the threshold manually. Experimental results in a public dataset show that the proposed method in this paper outperforms other baseline methods.

APA, Harvard, Vancouver, ISO, and other styles

15

Huang, Kefan, Kevin P. Hallinan, Robert Lou, Abdulrahman Alanezi, Salahaldin Alshatshati, and Qiancheng Sun. "Self-Learning Algorithm to Predict Indoor Temperature and Cooling Demand from Smart WiFi Thermostat in a Residential Building." Sustainability 12, no. 17 (August 31, 2020): 7110. http://dx.doi.org/10.3390/su12177110.

Full text

Abstract:

Smart WiFi thermostats have moved well beyond the function they were originally designed for; namely, controlling heating and cooling comfort in buildings. They are now also learning from occupant behaviors and permit occupants to control their comfort remotely. This research seeks to go beyond this state of the art by utilizing smart WiFi thermostat data in residences to develop dynamic predictive models for room temperature and cooling/heating demand. These models can then be used to estimate the energy savings from new thermostat temperature schedules and estimate peak load reduction achievable from maintaining a residence in a minimum thermal comfort condition. Back Propagation Neural Network (BPNN), Long-Short Term Memory (LSTM), and Encoder-Decoder LSTM dynamic models are explored. Results demonstrate that LSTM outperforms BPNN and Encoder-Decoder LSTM approach, yielding and a MAE error of 0.5 °C, equal to the resolution error of the measured temperature. Additionally, the models developed are shown to be highly accurate in predicting savings from aggressive thermostat set point schedules, yielding deep reduction of up to 14.3% for heating and cooling, as well as significant energy reduction from curtailed thermal comfort in response to a high demand event.

APA, Harvard, Vancouver, ISO, and other styles

16

Li, Chen, Junjun Zheng, Hiroyuki Okamura, and Tadashi Dohi. "Software Reliability Prediction through Encoder-Decoder Recurrent Neural Networks." International Journal of Mathematical, Engineering and Management Sciences 7, no. 3 (May 8, 2022): 325–40. http://dx.doi.org/10.33889/ijmems.2022.7.3.022.

Full text

Abstract:

With the growing demand for high reliability and safety software, software reliability prediction has attracted more and more attention to identifying potential faults in software. Software reliability growth models (SRGMs) are the most commonly used prediction models in practical software reliability engineering. However, their unrealistic assumptions and environment-dependent applicability restrict their development. Recurrent neural networks (RNNs), such as the long short-term memory (LSTM), provide an end-to-end learning method, have shown a remarkable ability in time-series forecasting and can be used to solve the above problem for software reliability prediction. In this paper, we present an attention-based encoder-decoder RNN called EDRNN to predict the number of failures in the software. More specifically, the encoder-decoder RNN estimates the cumulative faults with the fault detection time as input. The attention mechanism improves the prediction accuracy in the encoder-decoder architecture. Experimental results demonstrate that our proposed model outperforms other traditional SRGMs and neural network-based models in terms of accuracy.

APA, Harvard, Vancouver, ISO, and other styles

17

Wei, Zhangping, and Hai Cong Nguyen. "Storm Surge Forecast Using an Encoder–Decoder Recurrent Neural Network Model." Journal of Marine Science and Engineering 10, no. 12 (December 12, 2022): 1980. http://dx.doi.org/10.3390/jmse10121980.

Full text

Abstract:

This study presents an encoder–decoder neural network model to forecast storm surges on the US North Atlantic Coast. The proposed multivariate time-series forecast model consists of two long short-term memory (LSTM) models. The first LSTM model encodes the input sequence, including storm position, central pressure, and the radius of the maximum winds to an internal state. The second LSTM model decodes the internal state to forecast the storm surge water level and velocity. The neural network model was developed based on a storm surge dataset generated by the North Atlantic Comprehensive Coastal Study using a physics-based storm surge model. The neural network model was trained to predict storm surges at three forecast lead times ranging from 3 h to 12 h by learning the correlation between the past storm conditions and future storm hazards. The results show that the computationally efficient neural network model can forecast a storm in a fraction of one second. The neural network model not only forecasts peak surges, but also predicts the time-series profile of a storm. Furthermore, the model is highly versatile, and it can forecast storm surges generated by different sizes and strengths of bypassing and landfalling storms. Overall, this work demonstrates the success of data-driven approaches to improve coastal hazard research.

APA, Harvard, Vancouver, ISO, and other styles

18

Chu, Yan, Xiao Yue, Lei Yu, Mikhailov Sergei, and Zhengkui Wang. "Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention." Wireless Communications and Mobile Computing 2020 (October 20, 2020): 1–7. http://dx.doi.org/10.1155/2020/8909458.

Full text

Abstract:

Captioning the images with proper descriptions automatically has become an interesting and challenging problem. In this paper, we present one joint model AICRL, which is able to conduct the automatic image captioning based on ResNet50 and LSTM with soft attention. AICRL consists of one encoder and one decoder. The encoder adopts ResNet50 based on the convolutional neural network, which creates an extensive representation of the given image by embedding it into a fixed length vector. The decoder is designed with LSTM, a recurrent neural network and a soft attention mechanism, to selectively focus the attention over certain parts of an image to predict the next sentence. We have trained AICRL over a big dataset MS COCO 2014 to maximize the likelihood of the target description sentence given the training images and evaluated it in various metrics like BLEU, METEROR, and CIDEr. Our experimental results indicate that AICRL is effective in generating captions for the images.

APA, Harvard, Vancouver, ISO, and other styles

19

Waseem, Khawaja Hassan, Hammad Mushtaq, Fazeel Abid, Adnan M. Abu-Mahfouz, Asadullah Shaikh, Mehmet Turan, and Jawad Rasheed. "Forecasting of Air Quality Using an Optimized Recurrent Neural Network." Processes 10, no. 10 (October 18, 2022): 2117. http://dx.doi.org/10.3390/pr10102117.

Full text

Abstract:

Clean air is necessary for leading a healthy life. Many respiratory illnesses have their root in the poor quality of air across regions. Due to the tremendous impact of air quality on people’s lives, it is essential to devise a mechanism through which air pollutants (PM2.5, NOx, COx, SOx) can be forecasted. However, forecasting air quality and its pollutants is complicated as air quality depends on several factors such as weather, vehicular, and power plant emissions. This aim of this research was to find the impact of weather on PM2.5 concentrations and to forecast the daily and hourly PM2.5 concentration for the next 30 days and 72 h in Pakistan. This forecasting was done through state-of-the-art deep learning and machine learning models such as FbProphet, LSTM, and LSTM encoder–decoder. This research also successfully forecasted the proposed daily and hourly PM2.5 concentration. The LSTM encoder–decoder had the best performance and successfully forecasted PM2.5 concentration with a mean absolute percentage error (MAPE) of 28.2%, 15.07%, and 42.1% daily, and 11.75%, 9.5%, and 7.4% hourly for different cities in Pakistan. This research proves that a data-driven approach is essential for resolving air pollution in Pakistan.

APA, Harvard, Vancouver, ISO, and other styles

20

Banda, Anish. "Image Captioning using CNN and LSTM." International Journal for Research in Applied Science and Engineering Technology 9, no. 8 (August 31, 2021): 2666–69. http://dx.doi.org/10.22214/ijraset.2021.37846.

Full text

Abstract:

Abstract: In the model we proposed, we examine the deep neural networks-based image caption generation technique. We give image as input to the model, the technique give output in three different forms i.e., sentence in three different languages describing the image, mp3 audio file and an image file is also generated. In this model, we use the techniques of both computer vision and natural language processing. We are aiming to develop a model using the techniques of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to build a model to generate a Caption. Target image is compared with the training images, we have a large dataset containing the training images, this is done by convolutional neural network. This model generates a decent description utilizing the trained data. To extract features from images we need encoder, we use CNN as encoder. To decode the description of image generated we use LSTM. To evaluate the accuracy of generated caption we use BLEU metric algorithm. It grades the quality of content generated. Performance is calculated by the standard calculation matrices. Keywords: CNN, RNN, LSTM, BLEU score, encoder, decoder, captions, image description.

APA, Harvard, Vancouver, ISO, and other styles

21

Bappy, Jawadul H., Cody Simons, Lakshmanan Nataraj, B. S. Manjunath, and Amit K. Roy-Chowdhury. "Hybrid LSTM and Encoder–Decoder Architecture for Detection of Image Forgeries." IEEE Transactions on Image Processing 28, no. 7 (July 2019): 3286–300. http://dx.doi.org/10.1109/tip.2019.2895466.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Lu, Kuan, Wen Xue Sun, Xin Wang, Xiang Rong Meng, Yong Zhai, Hong Hai Li, and Rong Gui Zhang. "Short-term Wind Power Prediction Model Based on Encoder-Decoder LSTM." IOP Conference Series: Earth and Environmental Science 186 (October 11, 2018): 012020. http://dx.doi.org/10.1088/1755-1315/186/5/012020.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Habler, Edan, and Asaf Shabtai. "Using LSTM encoder-decoder algorithm for detecting anomalous ADS-B messages." Computers & Security 78 (September 2018): 155–73. http://dx.doi.org/10.1016/j.cose.2018.07.004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Ren, Jiayang, and Dong Ni. "A batch-wise LSTM-encoder decoder network for batch process monitoring." Chemical Engineering Research and Design 164 (December 2020): 102–12. http://dx.doi.org/10.1016/j.cherd.2020.09.019.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Zechin, Douglas, Matheus Basso do Amaral, and Helena Beatriz Bettella Cybis. "Previsão de velocidades de tráfego com rede neural LSTM encoder-decoder." TRANSPORTES 30, no. 3 (December 14, 2022): 2660. http://dx.doi.org/10.14295/transportes.v30i3.2660.

Full text

Abstract:

Este artigo tem como objetivo propor uma modelo de previsão de velocidades para um trecho de rodovia na cidade de Porto Alegre, que apresenta congestionamentos diariamente por conta de gargalos. Para realizar as previsões foram utilizados dados de tráfego e variáveis ambientais, como intensidade de chuva, acidentes e eventos atípicos. Propôs-se então um modelo de rede neural com arquitetura encoder-decoder e camadas long short-term memory (LSTM), que possuem a característica de estabelecer relações de longa dependência temporal entre as variáveis de entrada, sendo pertinentes para aplicações na área de Transportes. Como contribuições adicionais, avaliou-se a qualidade das previsões para diferentes horizontes de predição e regimes de tráfego, e comparou-se a capacidade e as curvas de probabilidade de breakdown calculadas com dados de campo e previstos. A metodologia apresentou desempenho satisfatório com base em ambos os critérios, sendo capaz de fazer boas previsões mesmo em situações críticas de tráfego.

APA, Harvard, Vancouver, ISO, and other styles

26

Choi, Dongho, Janghyuk Yim, Minjin Baek, and Sangsun Lee. "Machine Learning-Based Vehicle Trajectory Prediction Using V2V Communications and On-Board Sensors." Electronics 10, no. 4 (February 9, 2021): 420. http://dx.doi.org/10.3390/electronics10040420.

Full text

Abstract:

Predicting the trajectories of surrounding vehicles is important to avoid or mitigate collision with traffic participants. However, due to limited past information and the uncertainty in future driving maneuvers, trajectory prediction is a challenging task. Recently, trajectory prediction models using machine learning algorithms have been addressed solve to this problem. In this paper, we present a trajectory prediction method based on the random forest (RF) algorithm and the long short term memory (LSTM) encoder-decoder architecture. An occupancy grid map is first defined for the region surrounding the target vehicle, and then the row and the column that will be occupied by the target vehicle at future time steps are determined using the RF algorithm and the LSTM encoder-decoder architecture, respectively. For the collection of training data, the test vehicle was equipped with a camera and LIDAR sensors along with vehicular wireless communication devices, and the experiments were conducted under various driving scenarios. The vehicle test results demonstrate that the proposed method provides more robust trajectory prediction compared with existing trajectory prediction methods.

APA, Harvard, Vancouver, ISO, and other styles

27

Chang, Yeong-Hwa, Yen-Jen Chen, Ren-Hung Huang, and Yi-Ting Yu. "Enhanced Image Captioning with Color Recognition Using Deep Learning Methods." Applied Sciences 12, no. 1 (December 26, 2021): 209. http://dx.doi.org/10.3390/app12010209.

Full text

Abstract:

Automatically describing the content of an image is an interesting and challenging task in artificial intelligence. In this paper, an enhanced image captioning model—including object detection, color analysis, and image captioning—is proposed to automatically generate the textual descriptions of images. In an encoder–decoder model for image captioning, VGG16 is used as an encoder and an LSTM (long short-term memory) network with attention is used as a decoder. In addition, Mask R-CNN with OpenCV is used for object detection and color analysis. The integration of the image caption and color recognition is then performed to provide better descriptive details of images. Moreover, the generated textual sentence is converted into speech. The validation results illustrate that the proposed method can provide more accurate description of images.

APA, Harvard, Vancouver, ISO, and other styles

28

Zou, Xiangyu, Jinjin Zhao, Duan Zhao, Bin Sun, Yongxin He, and Stelios Fuentes. "Air Quality Prediction Based on a Spatiotemporal Attention Mechanism." Mobile Information Systems 2021 (February 19, 2021): 1–12. http://dx.doi.org/10.1155/2021/6630944.

Full text

Abstract:

With the rapid development of the Internet of Things and Big Data, smart cities have received increasing attention. Predicting air quality accurately and efficiently is an important part of building a smart city. However, air quality prediction is very challenging because it is affected by many complex factors, such as dynamic spatial correlation between air quality detection sensors, dynamic temporal correlation, and external factors (such as road networks and points of interest). Therefore, this paper proposes a long short-term memory (LSTM) air quality prediction model based on a spatiotemporal attention mechanism (STA-LSTM). The model uses an encoder-decoder structure to model spatiotemporal features. A spatial attention mechanism is introduced in the encoder to capture the relative influence of surrounding sites on the prediction area. A temporal attention mechanism is introduced in the decoder to capture the time dependence of air quality. In addition, for spatial data such as point of interest (POI) and road networks, this paper uses the LINE graph embedding method to obtain a low-dimensional vector representation of spatial data to obtain abundant spatial features. This paper evaluates STA-LSTM on the Beijing dataset, and the root mean square error (RMSE) and R-squared ( R 2 ) indicators are used to compare with six benchmarks. The experimental results show that the model proposed in this paper can achieve better performance than the performances of other benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

29

Javaloy, Adrián, and Ginés García-Mateos. "Text Normalization Using Encoder–Decoder Networks Based on the Causal Feature Extractor." Applied Sciences 10, no. 13 (June 30, 2020): 4551. http://dx.doi.org/10.3390/app10134551.

Full text

Abstract:

The encoder–decoder architecture is a well-established, effective and widely used approach in many tasks of natural language processing (NLP), among other domains. It consists of two closely-collaborating components: An encoder that transforms the input into an intermediate form, and a decoder producing the output. This paper proposes a new method for the encoder, named Causal Feature Extractor (CFE), based on three main ideas: Causal convolutions, dilatations and bidirectionality. We apply this method to text normalization, which is a ubiquitous problem that appears as the first step of many text-to-speech (TTS) systems. Given a text with symbols, the problem consists in writing the text exactly as it should be read by the TTS system. We make use of an attention-based encoder–decoder architecture using a fine-grained character-level approach rather than the usual word-level one. The proposed CFE is compared to other common encoders, such as convolutional neural networks (CNN) and long-short term memories (LSTM). Experimental results show the feasibility of CFE, achieving better results in terms of accuracy, number of parameters, convergence time, and use of an attention mechanism based on attention matrices. The obtained accuracy ranges from 83.5% to 96.8% correctly normalized sentences, depending on the dataset. Moreover, the proposed method is generic and can be applied to different types of input such as text, audio and images.

APA, Harvard, Vancouver, ISO, and other styles

30

Javaloy, Adrián, and Ginés García-Mateos. "Preliminary Results on Different Text Processing Tasks Using Encoder-Decoder Networks and the Causal Feature Extractor." Applied Sciences 10, no. 17 (August 20, 2020): 5772. http://dx.doi.org/10.3390/app10175772.

Full text

Abstract:

Deep learning methods are gaining popularity in different application domains, and especially in natural language processing. It is commonly believed that using a large enough dataset and an adequate network architecture, almost any processing problem can be solved. A frequent and widely used typology is the encoder-decoder architecture, where the input data is transformed into an intermediate code by means of an encoder, and then a decoder takes this code to produce its output. Different types of networks can be used in the encoder and the decoder, depending on the problem of interest, such as convolutional neural networks (CNN) or long-short term memories (LSTM). This paper uses for the encoder a method recently proposed, called Causal Feature Extractor (CFE). It is based on causal convolutions (i.e., convolutions that depend only on one direction of the input), dilatation (i.e., increasing the aperture size of the convolutions) and bidirectionality (i.e., independent networks in both directions). Some preliminary results are presented on three different tasks and compared with state-of-the-art methods: bilingual translation, LaTeX decompilation and audio transcription. The proposed method achieves promising results, showing its ubiquity to work with text, audio and images. Moreover, it has a shorter training time, requiring less time per iteration, and a good use of the attention mechanisms based on attention matrices.

APA, Harvard, Vancouver, ISO, and other styles

31

Gao, Miao, and Guo-You Shi. "Ship-Collision Avoidance Decision-Making Learning of Unmanned Surface Vehicles with Automatic Identification System Data Based on Encoder—Decoder Automatic-Response Neural Networks." Journal of Marine Science and Engineering 8, no. 10 (September 27, 2020): 754. http://dx.doi.org/10.3390/jmse8100754.

Full text

Abstract:

Intelligent unmanned surface vehicle (USV) collision avoidance is a complex inference problem based on current navigation status. This requires simultaneous processing of the input sequences and generation of the response sequences. The automatic identification system (AIS) encounter data mainly include the time-series data of two AIS sets, which exhibit a one-to-one mapping relation. Herein, an encoder–decoder automatic-response neural network is designed and implemented based on the sequence-to-sequence (Seq2Seq) structure to simultaneously process the two AIS encounter trajectory sequences. Furthermore, this model is combined with the bidirectional long short-term memory recurrent neural networks (Bi-LSTM RNN) to obtain a network framework for processing the time-series data to obtain ship-collision avoidance decisions based on big data. The encoder–decoder neural networks were trained based on the AIS data obtained in 2018 from Zhoushan Port to achieve ship collision avoidance decision-making learning. The results indicated that the encoder–decoder neural networks can be used to effectively formulate the sequence of the collision avoidance decision of the USV. Thus, this study significantly contributes to the increased efficiency and safety of maritime transportation. The proposed method can potentially be applied to the USV technology and intelligent collision-avoidance systems.

APA, Harvard, Vancouver, ISO, and other styles

32

Das, Amit Kumar, Abdullah Al Asif, Anik Paul, and Md Nur Hossain. "Bangla hate speech detection on social media using attention-based recurrent neural network." Journal of Intelligent Systems 30, no. 1 (January 1, 2021): 578–91. http://dx.doi.org/10.1515/jisys-2020-0060.

Full text

Abstract:

Abstract Hate speech has spread more rapidly through the daily use of technology and, most notably, by sharing your opinions or feelings on social media in a negative aspect. Although numerous works have been carried out in detecting hate speeches in English, German, and other languages, very few works have been carried out in the context of the Bengali language. In contrast, millions of people communicate on social media in Bengali. The few existing works that have been carried out need improvements in both accuracy and interpretability. This article proposed encoder–decoder-based machine learning model, a popular tool in NLP, to classify user’s Bengali comments from Facebook pages. A dataset of 7,425 Bengali comments, consisting of seven distinct categories of hate speeches, was used to train and evaluate our model. For extracting and encoding local features from the comments, 1D convolutional layers were used. Finally, the attention mechanism, LSTM, and GRU-based decoders have been used for predicting hate speech categories. Among the three encoder–decoder algorithms, attention-based decoder obtained the best accuracy (77%).

APA, Harvard, Vancouver, ISO, and other styles

33

Usman Younus, Muhammad, Rabia Shafi, Ammar Rafiq, Muhammad Rizwan Anjum, Sharjeel Afridi, Abdul Aleem Jamali, and Zulfiqar Ali Arain. "Encoder-Decoder Based LSTM Model to Advance User QoE in 360-Degree Video." Computers, Materials & Continua 71, no. 2 (2022): 2617–31. http://dx.doi.org/10.32604/cmc.2022.022236.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Zhu, Kedong, Yaping Li, Wenbo Mao, Feng Li, and Jiahao Yan. "LSTM enhanced by dual-attention-based encoder-decoder for daily peak load forecasting." Electric Power Systems Research 208 (July 2022): 107860. http://dx.doi.org/10.1016/j.epsr.2022.107860.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Li, Fa, Zhipeng Gui, Zhaoyu Zhang, Dehua Peng, Siyu Tian, Kunxiaojia Yuan, Yunzeng Sun, Huayi Wu, Jianya Gong, and Yichen Lei. "A hierarchical temporal attention-based LSTM encoder-decoder model for individual mobility prediction." Neurocomputing 403 (August 2020): 153–66. http://dx.doi.org/10.1016/j.neucom.2020.03.080.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Ellis, Matthew J., and Venkatesh Chinde. "An encoder–decoder LSTM-based EMPC framework applied to a building HVAC system." Chemical Engineering Research and Design 160 (August 2020): 508–20. http://dx.doi.org/10.1016/j.cherd.2020.06.008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Zhang, Yikui, Silvan Ragettli, Peter Molnar, Olga Fink, and Nadav Peleg. "Generalization of an Encoder-Decoder LSTM model for flood prediction in ungauged catchments." Journal of Hydrology 614 (November 2022): 128577. http://dx.doi.org/10.1016/j.jhydrol.2022.128577.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Billah, Mohammad Masum, Jing Zhang, and Tianchi Zhang. "A Method for Vessel’s Trajectory Prediction Based on Encoder Decoder Architecture." Journal of Marine Science and Engineering 10, no. 10 (October 18, 2022): 1529. http://dx.doi.org/10.3390/jmse10101529.

Full text

Abstract:

Data-driven technologies and automated identification systems (AISs) provide unprecedented opportunities for maritime surveillance. As part of enhancing maritime situational awareness and safety, in this paper, we address the issue of predicting a ship’s future trajectory using historical AIS observations. The objective is to use past data in the training phase to learn the predictive distribution of marine traffic patterns and then use that information to forecast future trajectories. To achieve this, we investigate an encoder–decoder architecture-based sequence-to-sequence prediction model and CNN model. This architecture includes a long short-term memory (LSTM) RNN that encodes sequential AIS data from the past and generates future trajectory samples. The effectiveness of sequence-to-sequence neural networks (RNNs) for forecasting future vessel trajectories is demonstrated through an experimental assessment using an AIS dataset.

APA, Harvard, Vancouver, ISO, and other styles

39

Ni, Xiang, Jing Li, Mo Yu, Wang Zhou, and Kun-Lung Wu. "Generalizable Resource Allocation in Stream Processing via Deep Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 857–64. http://dx.doi.org/10.1609/aaai.v34i01.5431.

Full text

Abstract:

This paper considers the problem of resource allocation in stream processing, where continuous data flows must be processed in real time in a large distributed system. To maximize system throughput, the resource allocation strategy that partitions the computation tasks of a stream processing graph onto computing devices must simultaneously balance workload distribution and minimize communication. Since this problem of graph partitioning is known to be NP-complete yet crucial to practical streaming systems, many heuristic-based algorithms have been developed to find reasonably good solutions. In this paper, we present a graph-aware encoder-decoder framework to learn a generalizable resource allocation strategy that can properly distribute computation tasks of stream processing graphs unobserved from training data. We, for the first time, propose to leverage graph embedding to learn the structural information of the stream processing graphs. Jointly trained with the graph-aware decoder using deep reinforcement learning, our approach can effectively find optimized solutions for unseen graphs. Our experiments show that the proposed model outperforms both METIS, a state-of-the-art graph partitioning algorithm, and an LSTM-based encoder-decoder model, in about 70% of the test cases.

APA, Harvard, Vancouver, ISO, and other styles

40

Poleak, Chanrith, and Jangwoo Kwon. "Parallel Image Captioning Using 2D Masked Convolution." Applied Sciences 9, no. 9 (May 7, 2019): 1871. http://dx.doi.org/10.3390/app9091871.

Full text

Abstract:

Automatically generating a novel description of an image is a challenging and important problem that brings together advanced research in both computer vision and natural language processing. In recent years, image captioning has significantly improved its performance by using long short-term memory (LSTM) as a decoder for the language model. However, despite this improvement, LSTM itself has its own shortcomings as a model because the structure is complicated and its nature is inherently sequential. This paper proposes a model using a simple convolutional network for both encoder and decoder functions of image captioning, instead of the current state-of-the-art approach. Our experiment with this model on a Microsoft Common Objects in Context (MSCOCO) captioning dataset yielded results that are competitive with the state-of-the-art image captioning model across different evaluation metrics, while having a much simpler model and enabling parallel graphics processing unit (GPU) computation during training, resulting in a faster training time.

APA, Harvard, Vancouver, ISO, and other styles

41

SUN, Jiyu, Zemin ZHU, Jun SONG, Junru GUO, Yu CAI, Yanzhao FU, Linhui Wang, and A. Polonsky. "Research on multivariate Yellow Sea SST week prediction method based on encoder-decoder LSTM." Monitoring systems of environment, no. 1 (March 28, 2022): 5–14. http://dx.doi.org/10.33075/2220-5861-2022-1-5-14.

Full text

Abstract:

In order to further improve the accuracy and stability of sea surface temperature forecasting, this paper uses the 25-year historical data of OISST V2.0 and OAFlux, and fully considers factors such as radiation flux, heat flux, wind speed, air temperature, air specific humidity and SST. By controlling the variables and selecting the best model parameters, a multivariate Yellow Sea SST weekly prediction model based on the Encoder-Decoder LSTM (Long Short Term Memory) was constructed for the first time. The model can effectively track the daily change trend of SST, and respond to its fluctuation changes to achieve relatively accurate prediction. Taking 2008 as an example, the daily absolute errors of the test set within a week are 0.3836, 0.4523, 0.5276, 0.5905, 0.6362, 0.6644, and 0.6827, and the overall RMSE is 0.7594. It is concluded that further research is needed on the optimization of predictors and the ap-plicability of single-point forecasting using the discussed model.

APA, Harvard, Vancouver, ISO, and other styles

42

Deshannavar, Umesh. "HIGH DIMENSIONAL WEATHER DATA USED IN A DEEP GENERATIVE MODEL TO PREDICT TRAJECTORIES OF AIRCRAFT." Journal of Airline Operations and Aviation Management 1, no. 1 (July 25, 2022): 80–88. http://dx.doi.org/10.56801/jaoam.v1i1.10.

Full text

Abstract:

The effectiveness of the aviation community depends on accurate forecasting of a 4D aircraft's trajectory, whether in real time or for counter-reality analysis. creating an effective tree-matching technique for the first time in this research to create feature maps that resemble images for historical flight trajectories using high- fidelity meteorological information, including wind, temperature, and convective conditions. Approach the orbit's tracking points as a conditional Gaussian mixture with parameters so they can benefit from our suggested integrated iterative neural network depth generation model. A network of mixed density LSTM decoders and a long memory (LSTM) encoder network make up the terminal. The decoder network learns additional spatial correlations-time from past flight routes and outputs the parameters of the Gaussian composite after the encoder network combines the most recent recorded flight plan information into fixed- length state variables. To learn feature representations from three-dimensional weather feature maps, transformation layers are added into the pipeline.

APA, Harvard, Vancouver, ISO, and other styles

43

Chen, Zhe, Hongli Zhang, Lin Ye, and Shang Li. "An Approach Based on Multilevel Convolution for Sentence-Level Element Extraction of Legal Text." Wireless Communications and Mobile Computing 2021 (December 24, 2021): 1–12. http://dx.doi.org/10.1155/2021/1043872.

Full text

Abstract:

In the judicial field, with the increase of legal text data, the extraction of legal text elements plays a more and more important role. In this paper, we propose a sentence-level model of legal text element extraction based on the structure of multilabel text classification. Our proposed model contains an encoder and an improved decoder. The encoder applies multilevel convolutional neural networks (CNN) and Long Short-Term Memory (LSTM) as feature extraction networks to extract local neighborhood and context information from legal text, and a decoder applies LSTM with multiattention and full connection layer with an improved initialization method to decode and generate label sequences. To our best knowledge, it is one of the first attempts to apply a multilabel classification algorithm for element extraction of legal text. In order to verify the effectiveness of our model, we conduct experiments not only on three real legal text datasets but also on a general multilabel text classification dataset.The experimental results demonstrate that our proposed model outperforms baseline models on legal text datasets, and our model is competitive to baseline models on the general text multilabel classification dataset, which indicates that our proposed model is useful for multilabel classification tasks of ordinary texts and legal texts with an uncertain number of characters in words and short lengths.

APA, Harvard, Vancouver, ISO, and other styles

44

Trisedya, Bayu, Jianzhong Qi, and Rui Zhang. "Sentence Generation for Entity Description with Content-Plan Attention." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 9057–64. http://dx.doi.org/10.1609/aaai.v34i05.6439.

Full text

Abstract:

We study neural data-to-text generation. Specifically, we consider a target entity that is associated with a set of attributes. We aim to generate a sentence to describe the target entity. Previous studies use encoder-decoder frameworks where the encoder treats the input as a linear sequence and uses LSTM to encode the sequence. However, linearizing a set of attributes may not yield the proper order of the attributes, and hence leads the encoder to produce an improper context to generate a description. To handle disordered input, recent studies propose two-stage neural models that use pointer networks to generate a content-plan (i.e., content-planner) and use the content-plan as input for an encoder-decoder model (i.e., text generator). However, in two-stage models, the content-planner may yield an incomplete content-plan, due to missing one or more salient attributes in the generated content-plan. This will in turn cause the text generator to generate an incomplete description. To address these problems, we propose a novel attention model that exploits content-plan to highlight salient attributes in a proper order. The challenge of integrating a content-plan in the attention model of an encoder-decoder framework is to align the content-plan and the generated description. We handle this problem by devising a coverage mechanism to track the extent to which the content-plan is exposed in the previous decoding time-step, and hence it helps our proposed attention model select the attributes to be mentioned in the description in a proper order. Experimental results show that our model outperforms state-of-the-art baselines by up to 3% and 5% in terms of BLEU score on two real-world datasets, respectively.

APA, Harvard, Vancouver, ISO, and other styles

45

Wang, Yiyu, Jungang Xu, and Yingfei Sun. "End-to-End Transformer Based Model for Image Captioning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 2585–94. http://dx.doi.org/10.1609/aaai.v36i3.20160.

Full text

Abstract:

CNN-LSTM based architectures have played an important role in image captioning, but limited by the training efficiency and expression ability, researchers began to explore the CNN-Transformer based models and achieved great success. Meanwhile, almost all recent works adopt Faster R-CNN as the backbone encoder to extract region-level features from given images. However, Faster R-CNN needs a pre-training on an additional dataset, which divides the image captioning task into two stages and limits its potential applications. In this paper, we build a pure Transformer-based model, which integrates image captioning into one stage and realizes end-to-end training. Firstly, we adopt SwinTransformer to replace Faster R-CNN as the backbone encoder to extract grid-level features from given images; Then, referring to Transformer, we build a refining encoder and a decoder. The refining encoder refines the grid features by capturing the intra-relationship between them, and the decoder decodes the refined features into captions word by word. Furthermore, in order to increase the interaction between multi-modal (vision and language) features to enhance the modeling capability, we calculate the mean pooling of grid features as the global feature, then introduce it into refining encoder to refine with grid features together, and add a pre-fusion process of refined global feature and generated words in decoder. To validate the effectiveness of our proposed model, we conduct experiments on MSCOCO dataset. The experimental results compared to existing published works demonstrate that our model achieves new state-of-the-art performances of 138.2% (single model) and 141.0% (ensemble of 4 models) CIDEr scores on 'Karpathy' offline test split and 136.0% (c5) and 138.3% (c40) CIDEr scores on the official online test server. Trained models and source code will be released.

APA, Harvard, Vancouver, ISO, and other styles

46

Zhang, Fugui, Can Lai, and Wanjun Chen. "Weather Radar Echo Extrapolation Method Based on Deep Learning." Atmosphere 13, no. 5 (May 16, 2022): 815. http://dx.doi.org/10.3390/atmos13050815.

Full text

Abstract:

In order to forecast some high intensity and rapidly changing phenomena, such as thunderstorms, heavy rain, and hail within 2 h, and reduce the influence brought by destructive weathers, this paper proposes a weather radar echo extrapolation method based on deep learning. The proposed method includes the design and combination of the data preprocessing, convolutional long short-term memory (Conv-LSTM) neuron and encoder–decoder model. We collect eleven thousand weather radar echo data in high spatiotemporal resolution, these data are then preprocessed before they enter the neural network for training to improve the data’s quality and make the training better. Next, the neuron integrates the structure and the advantages of convolutional neural network (CNN) and long short-term memory (LSTM), called Conv-LSTM, is applied to solve the problem that the full-connection LSTM (FC-LSTM) cannot extract the spatial information of input data. This operation replaced the full-connection structure in the input-to-state and state-to-state parts so that the Conv-LSTM can extract the information from other dimensions. Meanwhile, the encoder–decoder model is adopted due to the size difference of the input and output data to combine with the Conv-LSTM neuron. In the neural network training, mean square error (MSE) loss function weighted according to the rate of rainfall is added. Finally, the matrix “point-to-point” test method, including the probability of detection (POD), critical success index (CSI), false alarm ratio (FAR) and spatial test method contiguous rain areas (CRA), is used to examine the radar echo extrapolation’s results. Under the threshold of 30 dBZ, at the time of 1 h, we achieved 0.60 (POD), 0.42 (CSI) and 0.51 (FAR), compared with 0.42, 0.28 and 0.58 for the CTREC algorithm, and 0.30, 0.24 and 0.71 for the TITAN algorithm. Meanwhile, at the time of 1 h, we achieved 1.35 (total MSE ) compared with 3.26 for the CTREC algorithm and 3.05 for the TITAN algorithm. The results demonstrate that the radar echo extrapolation method based on deep learning is obviously more accurate and stable than traditional radar echo extrapolation methods in near weather forecasting.

APA, Harvard, Vancouver, ISO, and other styles

47

Varade, Saurabh, Ejaaz Sayyed, Vaibhavi Nagtode, and Shilpa Shinde. "Text Summarization using Extractive and Abstractive Methods." ITM Web of Conferences 40 (2021): 03023. http://dx.doi.org/10.1051/itmconf/20214003023.

Full text

Abstract:

Text Summarization is a process where a huge text file is converted into summarized version which will preserve the original meaning and context. The main aim of any text summarization is to provide a accurate and precise summary. One approach is to use a sentence ranking algorithm. This comes under extractive summarization. Here, a graph based ranking algorithm is used to rank the sentences in the text and then top k-scored sentences are included in the summary. The most widely used algorithm to decide the importance of any vertex in a graph based on the information retrieved from the graph is Graph Based Ranking Algorithm. TextRank is one of the most efficient ranking algorithms which is used for Web link analysis that is for measuring the importance of website pages. Another approach is abstractive summarization where a LSTM encoder decoder model is used along with attention mechanism which focuses on some important words from the input. Encoder encodes the input sequence and decoder along with attention mechanism gives the summary as the output.

APA, Harvard, Vancouver, ISO, and other styles

48

Huang, Feiyan, Shangyou Zeng, Jie Ke, Songtong Lei, and JinJin Wang. "A Video Description Model with Improved Attention Mechanism." Journal of Physics: Conference Series 2384, no. 1 (December 1, 2022): 012015. http://dx.doi.org/10.1088/1742-6596/2384/1/012015.

Full text

Abstract:

Abstract Video description generation refers to the automatic generation of text descriptions of videos by computers, which belongs to the intersection of computer vision and natural language processing. Aiming at the problem that the traditional attention mechanism has insufficient ability to extract video features, the model is complex and the description quality is not high, this paper proposes a video description model with an improved attention mechanism. The model is based on the encoder-decoder structure, uses inception-v4 as the encoder to extract features, and introduces a lightweight coordinate attention module (CA) into the attention mechanism, which improves the feature extraction effect and reduces the model complexity, and sends the extracted important feature information into the decoder long short-term memory network (LSTM) to generate the description sentence corresponding to the video. The model is validated on the MSVD dataset using various evaluation metrics (BLEU, ROUGEL, CIDEr, METEOR). The experimental results show that the improved attention mechanism of the video description model proposed in this paper has better accuracy in different performance metrics and can further improve the performance of video description.

APA, Harvard, Vancouver, ISO, and other styles

49

Talafha, Bashar, Analle Abuammar, and Mahmoud Al-Ayyoub. "Atar: Attention-based LSTM for Arabizi transliteration." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 3 (June 1, 2021): 2327. http://dx.doi.org/10.11591/ijece.v11i3.pp2327-2334.

Full text

Abstract:

A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present Atar, an attention-based encoder-decoder model for Arabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49).

APA, Harvard, Vancouver, ISO, and other styles

50

Yuan, Yuan, Lei Lin, Lian-Zhi Huo, Yun-Long Kong, Zeng-Guang Zhou, Bin Wu, and Yan Jia. "Using An Attention-Based LSTM Encoder–Decoder Network for Near Real-Time Disturbance Detection." IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13 (2020): 1819–32. http://dx.doi.org/10.1109/jstars.2020.2988324.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!