Log in

Relevant bibliographies by topics / Encoder-decoder / Journal articles

To see the other types of publications on this topic, follow the link: Encoder-decoder.

Journal articles on the topic 'Encoder-decoder'

Author: Grafiati

Published: 4 June 2021

Last updated: 7 December 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Encoder-decoder.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Chen, Zhang Jin, Guo Hai Zhong, and Zhuo Bi. "A High Speed 8B/10B Encoder/Decoder Design Based on Low Cost FPGA." Advanced Materials Research 462 (February 2012): 361–67. http://dx.doi.org/10.4028/www.scientific.net/amr.462.361.

Full text

Abstract:

A high speed 8B/10B Encoder/Decoder is presented in this paper. The Encoder/Decoder is based on Altera’s low cost FPGA Cyclone family. The Encoder/Decoder includes parallel pipeline structure. The Encoder/Decoder is applied to the Serializer/Deserializer (SERDES) of high-speed serial bus. The Encoder/Decoder is synthesized and simulated by Quartus II 9.1. The synthesis and analysis results show the maximum frequency is more than 359MHz. The timing simulation results show the clock frequency is more than 125 MHz. The single channel data rate of serial bus can get to 1.25Gbps. The proposed Encoder/Decoder can meet the requirements of most high-speed serial bus.

APA, Harvard, Vancouver, ISO, and other styles

2

Li, Yehao, Yingwei Pan, Ting Yao, Jingwen Chen, and Tao Mei. "Scheduled Sampling in Vision-Language Pretraining with Decoupled Encoder-Decoder Network." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 10 (May 18, 2021): 8518–26. http://dx.doi.org/10.1609/aaai.v35i10.17034.

Full text

Abstract:

Despite having impressive vision-language (VL) pretraining with BERT-based encoder for VL understanding, the pretraining of a universal encoder-decoder for both VL understanding and generation remains challenging. The difficulty originates from the inherently different peculiarities of the two disciplines, e.g., VL understanding tasks capitalize on the unrestricted message passing across modalities, while generation tasks only employ visual-to-textual message passing. In this paper, we start with a two-stream decoupled design of encoder-decoder structure, in which two decoupled cross-modal encoder and decoder are involved to separately perform each type of proxy tasks, for simultaneous VL understanding and generation pretraining. Moreover, for VL pretraining, the dominant way is to replace some input visual/word tokens with mask tokens and enforce the multi-modal encoder/decoder to reconstruct the original tokens, but no mask token is involved when fine-tuning on downstream tasks. As an alternative, we propose a primary scheduled sampling strategy that elegantly mitigates such discrepancy via pretraining encoder-decoder in a two-pass manner. Extensive experiments demonstrate the compelling generalizability of our pretrained encoder-decoder by fine-tuning on four VL understanding and generation downstream tasks. Source code is available at https://github.com/YehLi/TDEN.

APA, Harvard, Vancouver, ISO, and other styles

3

MARTINA, MAURIZIO, MARIO NICOLA, and GUIDO MASERA. "VLSI IMPLEMENTATION OF WiMax CONVOLUTIONAL TURBO CODE ENCODER AND DECODER." Journal of Circuits, Systems and Computers 18, no. 03 (May 2009): 535–64. http://dx.doi.org/10.1142/s0218126609005241.

Full text

Abstract:

A VLSI encoder and decoder implementation for the IEEE 802.16 WiMax convolutional turbo code is presented. Architectural choices employed to achieve high throughput, while granting a limited occupation of resources, are addressed both for the encoder and decoder side, including also the subblock interleaving and symbol selection functions specified in the standard. The complete encoder and decoder architectures, implemented on a 0.13 μm standard cell technology, sustain a decoded throughput of more than 90 Mb/s with a 200 MHz clock frequency. The encoder has the complexity of 9.2 kgate of logic and 187.2 kbit of memory, whereas the complete decoder requires 167.7 kgate and 1163 kbit.

APA, Harvard, Vancouver, ISO, and other styles

4

Andréasson, Joakim, Stephen D. Straight, Thomas A. Moore, Ana L. Moore, and Devens Gust. "Molecular All-Photonic Encoder−Decoder." Journal of the American Chemical Society 130, no. 33 (August 2008): 11122–28. http://dx.doi.org/10.1021/ja802845z.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Tasaki, Hirohisa. "Sound encoder and sound decoder." Journal of the Acoustical Society of America 123, no. 6 (2008): 4037. http://dx.doi.org/10.1121/1.2942447.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Miyasaka, Shuji. "Audio encoder and audio decoder." Journal of the Acoustical Society of America 127, no. 4 (2010): 2707. http://dx.doi.org/10.1121/1.3396203.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Abdulaziz AlArfaj, Abeer, and Hanan Ahmed Hosni Mahmoud. "A Moving Object Tracking Technique Using Few Frames with Feature Map Extraction and Feature Fusion." ISPRS International Journal of Geo-Information 11, no. 7 (July 7, 2022): 379. http://dx.doi.org/10.3390/ijgi11070379.

Full text

Abstract:

Moving object tracking techniques using machine and deep learning require large datasets for neural model training. New strategies need to be invented that utilize smaller data training sizes to realize the impact of large-sized datasets. However, current research does not balance the training data size and neural parameters, which creates the problem of inadequacy of the information provided by the low visual data content for parameter optimization. To enhance the performance of moving object tracking that appears in only a few frames, this research proposes a deep learning model using an abundant encoder–decoder (a high-resolution transformer (HRT) encoder–decoder). An HRT encoder–decoder employs feature map extraction that focuses on high resolution feature maps that are more representative of the moving object. In addition, we employ the proposed HRT encoder–decoder for feature map extraction and fusion to reimburse the few frames that have the visual information. Our extensive experiments on the Pascal DOC19 and MS-DS17 datasets have implied that the HRT encoder–decoder abundant model outperforms those of previous studies involving few frames that include moving objects.

APA, Harvard, Vancouver, ISO, and other styles

8

Sokorynska, Natalia, Yurii Posternak, Liliia Zaitseva, and Oleksandr Rudenok. "THE METHOD OF ADAPTIVE SELECTION OF THE SIZE OF TURBO CODE STATE DIAGRAMS IN 5G AND IOT SYSTEMS." Technical Sciences and Technologies, no. 2(32) (2023): 249–60. http://dx.doi.org/10.25140/2411-5363-2023-2(32)-249-260.

Full text

Abstract:

The article proposes a method for optimizing the operation of the turbocode encoder/decoder in 5G and IoT systems due to the adaptive selection of the state diagram size using the proposed decoding uncertainty indicator.The principles of forming state diagrams of the turbo code encoder and decoder are considered, and the uncertainty of data decoding is clarified. Using the a priori and a posteriori data of the turbo code decoder, an algorithm for changing the state diagram of the turbo code encoder/decoder is proposed.The essence of the method is to optimize the operation of the turbo code encoder and decoder by using adaptive selection of the state diagram size using the proposed decoding uncertainty indicator. The implementation of the method will allow to increase the given indicators of reliability of information without reducing the bandwidth of systems and networks of wireless data transmission systems.In contrast to the known results, depending on the signal-to-noise ratio in the channel and the values of the normalized number of sign changes of the a posteriori-prior logarithmic relations of the likelihood functions about the transmitted data bits of the turbo code decoder, an adaptive selection of the state diagram size of the TC encoder/decoder is made.The simulation analysis shows that in order to ensure the given indicators of information reliability, the method selects a rational size of the TC encoder/decoder state diagram, which is confirmed by comparison with other simulation results.The method can be used together with other methods of adaptation, for example, with adaptation of coding speed, poly-nomials of TC component codes, in systems with multi-parameter adaptation operating under conditions of a priori uncertainty. The types of virtual agents in social networks are distinguished: primary source, bot accounts, troll accounts, real people

APA, Harvard, Vancouver, ISO, and other styles

9

Seriès, Peggy, Alan A. Stocker, and Eero P. Simoncelli. "Is the Homunculus “Aware” of Sensory Adaptation?" Neural Computation 21, no. 12 (December 2009): 3271–304. http://dx.doi.org/10.1162/neco.2009.09-08-869.

Full text

Abstract:

Neural activity and perception are both affected by sensory history. The work presented here explores the relationship between the physiological effects of adaptation and their perceptual consequences. Perception is modeled as arising from an encoder-decoder cascade, in which the encoder is defined by the probabilistic response of a population of neurons, and the decoder transforms this population activity into a perceptual estimate. Adaptation is assumed to produce changes in the encoder, and we examine the conditions under which the decoder behavior is consistent with observed perceptual effects in terms of both bias and discriminability. We show that for all decoders, discriminability is bounded from below by the inverse Fisher information. Estimation bias, on the other hand, can arise for a variety of different reasons and can range from zero to substantial. We specifically examine biases that arise when the decoder is fixed, “unaware” of the changes in the encoding population (as opposed to “aware” of the adaptation and changing accordingly). We simulate the effects of adaptation on two well-studied sensory attributes, motion direction and contrast, assuming a gain change description of encoder adaptation. Although we cannot uniquely constrain the source of decoder bias, we find for both motion and contrast that an “unaware” decoder that maximizes the likelihood of the percept given by the preadaptation encoder leads to predictions that are consistent with behavioral data. This model implies that adaptation-induced biases arise as a result of temporary suboptimality of the decoder.

APA, Harvard, Vancouver, ISO, and other styles

10

Kim, Minhoe, and Woongsup Lee. "Deep Spread Multiplexing and Study of Training Methods for DNN-Based Encoder and Decoder." Sensors 23, no. 8 (April 10, 2023): 3848. http://dx.doi.org/10.3390/s23083848.

Full text

Abstract:

We propose a deep spread multiplexing (DSM) scheme using a DNN-based encoder and decoder and we investigate training procedures for a DNN-based encoder and decoder system. Multiplexing for multiple orthogonal resources is designed with an autoencoder structure, which originates from the deep learning technique. Furthermore, we investigate training methods that can leverage the performance in terms of various aspects such as channel models, training signal-to-noise (SNR) level and noise types. The performance of these factors is evaluated by training the DNN-based encoder and decoder and verified with simulation results.

APA, Harvard, Vancouver, ISO, and other styles

11

Li, Mingfei, Jiajian Wu, Zhengpeng Chen, Jiangbo Dong, Zhiping Peng, Kai Xiong, Mumin Rao, Chuangting Chen, and Xi Li. "Data-Driven Voltage Prognostic for Solid Oxide Fuel Cell System Based on Deep Learning." Energies 15, no. 17 (August 29, 2022): 6294. http://dx.doi.org/10.3390/en15176294.

Full text

Abstract:

A solid oxide fuel cell (SOFC) is an innovative power generation system that is green, efficient, and promising for a wide range of applications. The prediction and evaluation of the operation state of a solid oxide fuel cell system is of great significance for the stable and long-term operation of the power generation system. Prognostics and Health Management (PHM) technology is widely used to perform preventive and predictive maintenance on equipment. Unlike prediction based on the SOFC mechanistic model, the combination of PHM and deep learning has shown wide application prospects. Therefore, this study first obtains an experimental dataset through short-term degradation experiments of a 1 kW SOFC system, and then proposes an encoder-decoder RNN-based SOFC state prediction model. Based on the experimental dataset, the model can accurately predict the voltage variation of the SOFC system. The prediction results of the four different prediction models developed are compared and analyzed, namely, long short-term memory (LSTM), gated recurrent unit (GRU), encoder–decoder LSTM, and encoder–decoder GRU. The results show that for the SOFC test set, the mean square error of encoder–decoder LSTM and encoder–decoder GRU are 0.015121 and 0.014966, respectively, whereas the corresponding error results of LSTM and GRU are 0.017050 and 0.017456, respectively. The encoder–decoder RNN model displays high prediction precision, which proves that it can improve the accuracy of prediction, which is expected to be combined with control strategies and further help the implementation of PHM in fuel cells.

APA, Harvard, Vancouver, ISO, and other styles

12

Chen, Qian, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, and Hongwei Du. "RGB-D Salient Object Detection via 3D Convolutional Neural Networks." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1063–71. http://dx.doi.org/10.1609/aaai.v35i2.16191.

Full text

Abstract:

RGB-D salient object detection (SOD) recently has attracted increasing research interest and many deep learning methods based on encoder-decoder architectures have emerged. However, most existing RGB-D SOD models conduct feature fusion either in the single encoder or the decoder stage, which hardly guarantees sufficient cross-modal fusion ability. In this paper, we make the first attempt in addressing RGB-D SOD through 3D convolutional neural networks. The proposed model, named RD3D, aims at pre-fusion in the encoder stage and in-depth fusion in the decoder stage to effectively promote the full integration of RGB and depth streams. Specifically, RD3D first conducts pre-fusion across RGB and depth modalities through an inflated 3D encoder, and later provides in-depth feature fusion by designing a 3D decoder equipped with rich back-projection paths (RBPP) for leveraging the extensive aggregation ability of 3D convolutions. With such a progressive fusion strategy involving both the encoder and decoder, effective and thorough interaction between the two modalities can be exploited and boost the detection accuracy. Extensive experiments on six widely used benchmark datasets demonstrate that RD3D performs favorably against 14 state-of-the-art RGB-D SOD approaches in terms of four key evaluation metrics. Our code will be made publicly available: https://github.com/PPOLYpubki/RD3D.

APA, Harvard, Vancouver, ISO, and other styles

13

Meng, Zhaorui, and Xianze Xu. "A Hybrid Short-Term Load Forecasting Framework with an Attention-Based Encoder–Decoder Network Based on Seasonal and Trend Adjustment." Energies 12, no. 24 (December 4, 2019): 4612. http://dx.doi.org/10.3390/en12244612.

Full text

Abstract:

Accurate electrical load forecasting plays an important role in power system operation. An effective load forecasting approach can improve the operation efficiency of a power system. This paper proposes the seasonal and trend adjustment attention encoder–decoder (STA–AED), a hybrid short-term load forecasting approach based on a multi-head attention encoder–decoder module with seasonal and trend adjustment. A seasonal and trend decomposing technique is used to preprocess the original electrical load data. Each decomposed datum is regressed to predict the future electric load value by utilizing the encoder–decoder network with the multi-head attention mechanism. With the multi-head attention mechanism, STA–AED can interpret the prediction results more effectively. A large number of experiments and extensive comparisons have been carried out with a load forecasting dataset from the United States. The proposed hybrid STA–AED model is superior to the other five counterpart models such as random forest, gradient boosting decision tree (GBDT), gated recurrent units (GRUs), Encoder–Decoder, and Encoder–Decoder with multi-head attention. The proposed hybrid model shows the best prediction accuracy in 14 out of 15 zones in terms of both root mean square error (RMSE) and mean absolute percentage error (MAPE).

APA, Harvard, Vancouver, ISO, and other styles

14

Sridhar, Praveen Kumar, Nitin Srinivasan, Adithyan Arun Kumar, Gowthamaraj Rajendran, and Kishore Kumar Perumalsamy. "A Case Study on the Diminishing Popularity of Encoder-Only Architectures in Machine Learning Models." International Journal of Innovative Technology and Exploring Engineering 13, no. 4 (March 30, 2024): 22–27. http://dx.doi.org/10.35940/ijitee.d9827.13040324.

Full text

Abstract:

This paper examines the shift from encoder-only to decoder and encoder-decoder models in machine learning, highlighting the decline in popularity of encoder-only architectures. It explores the reasons behind this trend, such as the advancements in decoder models that offer superior generative capabilities, flexibility across various domains, and enhancements in unsupervised learning techniques. The study also discusses the role of prompting techniques in simplifying model architectures and enhancing model versatility. By analyzing the evolution, applications, and shifting preferences within the research community and industry, this paper aims to provide insights into the changing landscape of machine learning model architectures.

APA, Harvard, Vancouver, ISO, and other styles

15

Van Der Putten, Joost, Fons Van Der Sommen, and Peter H. N. De with. "Efficient Decoder Reduction for a Variety of Encoder-Decoder Problems." IEEE Access 8 (2020): 169444–55. http://dx.doi.org/10.1109/access.2020.3020360.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Choi, Yong-Seok, Yo-Han Park, and Kong Joo Lee. "Building a Korean morphological analyzer using two Korean BERT models." PeerJ Computer Science 8 (May 2, 2022): e968. http://dx.doi.org/10.7717/peerj-cs.968.

Full text

Abstract:

A morphological analyzer plays an essential role in identifying functional suffixes of Korean words. The analyzer input and output differ from each other in their length and strings, which can be dealt with by an encoder-decoder architecture. We adopt a Transformer architecture, which is an encoder-decoder architecture with self-attention rather than a recurrent connection, to implement a Korean morphological analyzer. Bidirectional Encoder Representations from Transformers (BERT) is one of the most popular pretrained representation models; it can present an encoded sequence of input words, considering contextual information. We initialize both the Transformer encoder and decoder with two types of Korean BERT, one of which is pretrained with a raw corpus, and the other is pretrained with a morphologically analyzed dataset. Therefore, implementing a Korean morphological analyzer based on Transformer is a fine-tuning process with a relatively small corpus. A series of experiments proved that parameter initialization using pretrained models can alleviate the chronic problem of a lack of training data and reduce the time required for training. In addition, we can determine the number of layers required for the encoder and decoder to optimize the performance of a Korean morphological analyzer.

APA, Harvard, Vancouver, ISO, and other styles

17

Li, Xinqing, Tanguy Tresor Sindihebura, Lei Zhou, Carlos M. Duarte, Daniel P. Costa, Mark A. Hindell, Clive McMahon, Mônica M. C. Muelbert, Xiangliang Zhang, and Chengbin Peng. "A prediction and imputation method for marine animal movement data." PeerJ Computer Science 7 (August 3, 2021): e656. http://dx.doi.org/10.7717/peerj-cs.656.

Full text

Abstract:

Data prediction and imputation are important parts of marine animal movement trajectory analysis as they can help researchers understand animal movement patterns and address missing data issues. Compared with traditional methods, deep learning methods can usually provide enhanced pattern extraction capabilities, but their applications in marine data analysis are still limited. In this research, we propose a composite deep learning model to improve the accuracy of marine animal trajectory prediction and imputation. The model extracts patterns from the trajectories with an encoder network and reconstructs the trajectories using these patterns with a decoder network. We use attention mechanisms to highlight certain extracted patterns as well for the decoder. We also feed these patterns into a second decoder for prediction and imputation. Therefore, our approach is a coupling of unsupervised learning with the encoder and the first decoder and supervised learning with the encoder and the second decoder. Experimental results demonstrate that our approach can reduce errors by at least 10% on average comparing with other methods.

APA, Harvard, Vancouver, ISO, and other styles

18

Bogawar, Mrs K. M., Ms Sharda Mungale, and Dr Manish Chavan. "Implementation of Turbo Encoder and Decoder." International Journal of Engineering Trends and Technology 8, no. 2 (February 25, 2014): 73–76. http://dx.doi.org/10.14445/22315381/ijett-v8p214.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Crossman, Antony Henry. "Fixed bit rate speech encoder/decoder." Journal of the Acoustical Society of America 103, no. 3 (March 1998): 1249. http://dx.doi.org/10.1121/1.423210.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

ABD-EL-BARR, MOSTAFA H. "CMOS quaternary logic encoder-decoder circuits." International Journal of Electronics 71, no. 2 (August 1991): 279–95. http://dx.doi.org/10.1080/00207219108925475.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Singh, Jaspreet, and Yashvardhan Sharma. "Encoder-Decoder Architectures for Generating Questions." Procedia Computer Science 132 (2018): 1041–48. http://dx.doi.org/10.1016/j.procs.2018.05.019.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

LI Xianqiang, AN Junshe, and XIE Yan. "Design of 1553B Encoder and Decoder." Chinese Journal of Space Science 40, no. 4 (2020): 602. http://dx.doi.org/10.11728/cjss2020.04.602.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Han, Huihui, Weitao Li, Jianping Wang, Dian Jiao, and Baishun Sun. "Semantic segmentation of encoder-decoder structure." Journal of Image and Graphics 25, no. 2 (2020): 255–66. http://dx.doi.org/10.11834/jig.190212.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

kumar, G. Madhu, and A. Swetha A.Swetha. "Design and implementation of convolution encoder and viterbi decoder." International Journal of Scientific Research 1, no. 6 (June 1, 2012): 65–66. http://dx.doi.org/10.15373/22778179/nov2012/23.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Park, YeongHyeon, and Il Yun. "Fast Adaptive RNN Encoder–Decoder for Anomaly Detection in SMD Assembly Machine." Sensors 18, no. 10 (October 22, 2018): 3573. http://dx.doi.org/10.3390/s18103573.

Full text

Abstract:

Surface Mounted Device (SMD) assembly machine manufactures various products on a flexible manufacturing line. An anomaly detection model that can adapt to the various manufacturing environments very fast is required. In this paper, we proposed a fast adaptive anomaly detection model based on a Recurrent Neural Network (RNN) Encoder–Decoder with operating machine sounds. RNN Encoder–Decoder has a structure very similar to Auto-Encoder (AE), but the former has significantly reduced parameters compared to the latter because of its rolled structure. Thus, the RNN Encoder–Decoder only requires a short training process for fast adaptation. The anomaly detection model decides abnormality based on Euclidean distance between generated sequences and observed sequence from machine sounds. Experimental evaluation was conducted on a set of dataset from the SMD assembly machine. Results showed cutting-edge performance with fast adaptation.

APA, Harvard, Vancouver, ISO, and other styles

26

Lamar, Annie K. "Generating Metrically Accurate Homeric Poetry with Recurrent Neural Networks." International Journal of Transdisciplinary Artificial Intelligence 2, no. 1 (August 1, 2020): 1–25. http://dx.doi.org/10.35708/tai1869-126247.

Full text

Abstract:

We investigate the generation of metrically accurate Homeric poetry using recurrent neural networks (RNN). We assess two models: a basic encoder-decoder RNN and the hierarchical recurrent encoderdecoder model (HRED). We assess the quality of the generated lines of poetry using quantitative metrical analysis and expert evaluation. This evaluation reveals that while the basic encoder-decoder is able to capture complex poetic meter, it under performs in terms of semantic coherence. The HRED model, however, produces more semantically coherent lines of poetry but is unable to capture the meter. Our research highlights the importance of expert evaluation and suggests that future research should focus on encoder-decoder models that balance various types of input – both immediate and long-range.

APA, Harvard, Vancouver, ISO, and other styles

27

KODAVALLA, VIJAY KUMAR, and P. G. KRISHNA MOHAN. "DISTRIBUTED VIDEO CODING: FEEDBACK-FREE ARCHITECTURE AND IMPLEMENTATION." International Journal of Image and Graphics 12, no. 02 (April 2012): 1250010. http://dx.doi.org/10.1142/s0219467812500106.

Full text

Abstract:

Distributed video coding (DVC) is new video coding paradigm for emerging applications such as wireless video cameras, wireless low-power surveillance networks, disposable video cameras, sensor networks, networked camcorders, etc. In traditional video coding standards (MPEG/H.264/DivX/VC1), typically the encoder is five to 10 times more complex than the decoder, which is well suited for broadcast and streaming video-on-demand systems, where video is compressed once and decoded many times. However, the emerging applications require dual system, i.e. low complex encoders, possibly at the expense of high complex decoders. Here, low complexity encoders are must because memory, computational power and energy are scarce at the encoder. Distributed coding exploits source statistics in decoder and hence encoder can be very simple, at the expense of the more complex decoder. In literature, various DVC Architectures proposed depend on availability of feedback channel from decoder to encoder, to achieve minimum rate for target quality. In practical systems usually bidirectional communication channels are not available. Other implications are in terms of decoding delay and decoder complexity, due to usage of feedback channel. Hence it is highly desirable to design DVC without need for feedback channel. In this paper, feedback-free DVC Architecture is proposed and C model implementation results are presented.

APA, Harvard, Vancouver, ISO, and other styles

28

Khanh, Trinh Le Ba, Duy-Phuong Dao, Ngoc-Huynh Ho, Hyung-Jeong Yang, Eu-Tteum Baek, Gueesang Lee, Soo-Hyung Kim, and Seok Bong Yoo. "Enhancing U-Net with Spatial-Channel Attention Gate for Abnormal Tissue Segmentation in Medical Imaging." Applied Sciences 10, no. 17 (August 19, 2020): 5729. http://dx.doi.org/10.3390/app10175729.

Full text

Abstract:

In recent years, deep learning has dominated medical image segmentation. Encoder-decoder architectures, such as U-Net, can be used in state-of-the-art models with powerful designs that are achieved by implementing skip connections that propagate local information from an encoder path to a decoder path to retrieve detailed spatial information lost by pooling operations. Despite their effectiveness for segmentation, these naïve skip connections still have some disadvantages. First, multi-scale skip connections tend to use unnecessary information and computational sources, where likable low-level encoder features are repeatedly used at multiple scales. Second, the contextual information of the low-level encoder feature is insufficient, leading to poor performance for pixel-wise recognition when concatenating with the corresponding high-level decoder feature. In this study, we propose a novel spatial-channel attention gate that addresses the limitations of plain skip connections. This can be easily integrated into an encoder-decoder network to effectively improve the performance of the image segmentation task. Comprehensive results reveal that our spatial-channel attention gate remarkably enhances the segmentation capability of the U-Net architecture with a minimal computational overhead added. The experimental results show that our proposed method outperforms the conventional deep networks in term of Dice score, which achieves 71.72%.

APA, Harvard, Vancouver, ISO, and other styles

29

Gu, Yeonghyeon, Zhegao Piao, and Seong Joon Yoo. "STHarDNet: Swin Transformer with HarDNet for MRI Segmentation." Applied Sciences 12, no. 1 (January 4, 2022): 468. http://dx.doi.org/10.3390/app12010468.

Full text

Abstract:

In magnetic resonance imaging (MRI) segmentation, conventional approaches utilize U-Net models with encoder–decoder structures, segmentation models using vision transformers, or models that combine a vision transformer with an encoder–decoder model structure. However, conventional models have large sizes and slow computation speed and, in vision transformer models, the computation amount sharply increases with the image size. To overcome these problems, this paper proposes a model that combines Swin transformer blocks and a lightweight U-Net type model that has an HarDNet blocks-based encoder–decoder structure. To maintain the features of the hierarchical transformer and shifted-windows approach of the Swin transformer model, the Swin transformer is used in the first skip connection layer of the encoder instead of in the encoder–decoder bottleneck. The proposed model, called STHarDNet, was evaluated by separating the anatomical tracings of lesions after stroke (ATLAS) dataset, which comprises 229 T1-weighted MRI images, into training and validation datasets. It achieved Dice, IoU, precision, and recall values of 0.5547, 0.4185, 0.6764, and 0.5286, respectively, which are better than those of the state-of-the-art models U-Net, SegNet, PSPNet, FCHarDNet, TransHarDNet, Swin Transformer, Swin UNet, X-Net, and D-UNet. Thus, STHarDNet improves the accuracy and speed of MRI image-based stroke diagnosis.

APA, Harvard, Vancouver, ISO, and other styles

30

Shim, Jae-hun, Hyunwoo Yu, Kyeongbo Kong, and Suk-Ju Kang. "FeedFormer: Revisiting Transformer Decoder for Efficient Semantic Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 2 (June 26, 2023): 2263–71. http://dx.doi.org/10.1609/aaai.v37i2.25321.

Full text

Abstract:

With the success of Vision Transformer (ViT) in image classification, its variants have yielded great success in many downstream vision tasks. Among those, the semantic segmentation task has also benefited greatly from the advance of ViT variants. However, most studies of the transformer for semantic segmentation only focus on designing efficient transformer encoders, rarely giving attention to designing the decoder. Several studies make attempts in using the transformer decoder as the segmentation decoder with class-wise learnable query. Instead, we aim to directly use the encoder features as the queries. This paper proposes the Feature Enhancing Decoder transFormer (FeedFormer) that enhances structural information using the transformer decoder. Our goal is to decode the high-level encoder features using the lowest-level encoder feature. We do this by formulating high-level features as queries, and the lowest-level feature as the key and value. This enhances the high-level features by collecting the structural information from the lowest-level feature. Additionally, we use a simple reformation trick of pushing the encoder blocks to take the place of the existing self-attention module of the decoder to improve efficiency. We show the superiority of our decoder with various light-weight transformer-based decoders on popular semantic segmentation datasets. Despite the minute computation, our model has achieved state-of-the-art performance in the performance computation trade-off. Our model FeedFormer-B0 surpasses SegFormer-B0 with 1.8% higher mIoU and 7.1% less computation on ADE20K, and 1.7% higher mIoU and 14.4% less computation on Cityscapes, respectively. Code will be released at: https://github.com/jhshim1995/FeedFormer.

APA, Harvard, Vancouver, ISO, and other styles

31

Kahina, Rekkal, and Abdesselam Bassou. "Improving the Performance of Viterbi Decoder using Window System." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 1 (February 1, 2018): 611. http://dx.doi.org/10.11591/ijece.v8i1.pp611-621.

Full text

Abstract:

An efficient Viterbi decoder is introduced in this paper; it is called Viterbi decoder with window system. The simulation results, over Gaussian channels, are performed from rate 1/2, 1/3 and 2/3 joined to TCM encoder with memory in order of 2, 3. These results show that the proposed scheme outperforms the classical Viterbi by a gain of 1 dB. On the other hand, we propose a function called RSCPOLY2TRELLIS, for recursive systematic convolutional (RSC) encoder which creates the trellis structure of a recursive systematic convolutional encoder from the matrix “H”. Moreover, we present a comparison between the decoding algorithms of the TCM encoder like Viterbi soft and hard, and the variants of the MAP decoder known as BCJR or forward-backward algorithm which is very performant in decoding TCM, but depends on the size of the code, the memory, and the CPU requirements of the application.

APA, Harvard, Vancouver, ISO, and other styles

32

R, Rohith, and Saji A J. "BCH Encoder and Decoder for Emerging Memories." December 2020 2, no. 4 (January 19, 2021): 220–27. http://dx.doi.org/10.36548/jei.2020.4.004.

Full text

Abstract:

In this paper, an encoder and decoder system is proposed using Bose-Chaudhuri-Hocquenghem (BCH) double-error-correcting and triple-error detecting (DEC-TED) with emerging memories of low power and high decoding efficiency. An adaptive error correction technique and an invalid transition inhibition technique is enforced to the decoder. This is to improve the decoding efficiency and reduce the power consumption and delay. The adaptive error correction gives high decoding efficiency and invalid transition technique reduce the power consumption issue in conventional BCH decoders. The DEC-TED BCH decoder combines these two techniques by using a specific Error Correcting Code Clock and Flip Flops. This technique provides an error correcting encoder and decoder solution for low power and high-performance application using emerging memories. The design simulated in Xilinx FPGA using ISE Design Suite 14.5.

APA, Harvard, Vancouver, ISO, and other styles

33

Zheng, Chuanpan, Xiaoliang Fan, Cheng Wang, and Jianzhong Qi. "GMAN: A Graph Multi-Attention Network for Traffic Prediction." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 1234–41. http://dx.doi.org/10.1609/aaai.v34i01.5477.

Full text

Abstract:

Long-term traffic prediction is highly challenging due to the complexity of traffic systems and the constantly changing nature of many impacting factors. In this paper, we focus on the spatio-temporal factors, and propose a graph multi-attention network (GMAN) to predict traffic conditions for time steps ahead at different locations on a road network graph. GMAN adapts an encoder-decoder architecture, where both the encoder and the decoder consist of multiple spatio-temporal attention blocks to model the impact of the spatio-temporal factors on traffic conditions. The encoder encodes the input traffic features and the decoder predicts the output sequence. Between the encoder and the decoder, a transform attention layer is applied to convert the encoded traffic features to generate the sequence representations of future time steps as the input of the decoder. The transform attention mechanism models the direct relationships between historical and future time steps that helps to alleviate the error propagation problem among prediction time steps. Experimental results on two real-world traffic prediction tasks (i.e., traffic volume prediction and traffic speed prediction) demonstrate the superiority of GMAN. In particular, in the 1 hour ahead prediction, GMAN outperforms state-of-the-art methods by up to 4% improvement in MAE measure. The source code is available at https://github.com/zhengchuanpan/GMAN.

APA, Harvard, Vancouver, ISO, and other styles

34

Chen, Yunfan, and Hyunchul Shin. "Pedestrian Detection at Night in Infrared Images Using an Attention-Guided Encoder-Decoder Convolutional Neural Network." Applied Sciences 10, no. 3 (January 23, 2020): 809. http://dx.doi.org/10.3390/app10030809.

Full text

Abstract:

Pedestrian-related accidents are much more likely to occur during nighttime when visible (VI) cameras are much less effective. Unlike VI cameras, infrared (IR) cameras can work in total darkness. However, IR images have several drawbacks, such as low-resolution, noise, and thermal energy characteristics that can differ depending on the weather. To overcome these drawbacks, we propose an IR camera system to identify pedestrians at night that uses a novel attention-guided encoder-decoder convolutional neural network (AED-CNN). In AED-CNN, encoder-decoder modules are introduced to generate multi-scale features, in which new skip connection blocks are incorporated into the decoder to combine the feature maps from the encoder and decoder module. This new architecture increases context information which is helpful for extracting discriminative features from low-resolution and noisy IR images. Furthermore, we propose an attention module to re-weight the multi-scale features generated by the encoder-decoder module. The attention mechanism effectively highlights pedestrians while eliminating background interference, which helps to detect pedestrians under various weather conditions. Empirical experiments on two challenging datasets fully demonstrate that our method shows superior performance. Our approach significantly improves the precision of the state-of-the-art method by 5.1% and 23.78% on the Keimyung University (KMU) and Computer Vision Center (CVC)-09 pedestrian dataset, respectively.

APA, Harvard, Vancouver, ISO, and other styles

35

Shi, Han, Haozheng Fan, and James T. Kwok. "Effective Decoding in Graph Auto-Encoder Using Triadic Closure." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 906–13. http://dx.doi.org/10.1609/aaai.v34i01.5437.

Full text

Abstract:

The (variational) graph auto-encoder and its variants have been popularly used for representation learning on graph-structured data. While the encoder is often a powerful graph convolutional network, the decoder reconstructs the graph structure by only considering two nodes at a time, thus ignoring possible interactions among edges. On the other hand, structured prediction, which considers the whole graph simultaneously, is computationally expensive. In this paper, we utilize the well-known triadic closure property which is exhibited in many real-world networks. We propose the triad decoder, which considers and predicts the three edges involved in a local triad together. The triad decoder can be readily used in any graph-based auto-encoder. In particular, we incorporate this to the (variational) graph auto-encoder. Experiments on link prediction, node clustering and graph generation show that the use of triads leads to more accurate prediction, clustering and better preservation of the graph characteristics.

APA, Harvard, Vancouver, ISO, and other styles

36

Zhang, Wenbo, Xiao Li, Yating Yang, Rui Dong, and Gongxu Luo. "Keeping Models Consistent between Pretraining and Translation for Low-Resource Neural Machine Translation." Future Internet 12, no. 12 (November 27, 2020): 215. http://dx.doi.org/10.3390/fi12120215.

Full text

Abstract:

Recently, the pretraining of models has been successfully applied to unsupervised and semi-supervised neural machine translation. A cross-lingual language model uses a pretrained masked language model to initialize the encoder and decoder of the translation model, which greatly improves the translation quality. However, because of a mismatch in the number of layers, the pretrained model can only initialize part of the decoder’s parameters. In this paper, we use a layer-wise coordination transformer and a consistent pretraining translation transformer instead of a vanilla transformer as the translation model. The former has only an encoder, and the latter has an encoder and a decoder, but the encoder and decoder have exactly the same parameters. Both models can guarantee that all parameters in the translation model can be initialized by the pretrained model. Experiments on the Chinese–English and English–German datasets show that compared with the vanilla transformer baseline, our models achieve better performance with fewer parameters when the parallel corpus is small.

APA, Harvard, Vancouver, ISO, and other styles

37

Azizi, Fityan, Mgs M. Luthfi Ramadhan, and Wisnu Jatmiko. "Encoder-Decoder with Atrous Spatial Pyramid Pooling for Left Ventricle Segmentation in Echocardiography." Jurnal Ilmu Komputer dan Informasi 16, no. 2 (July 3, 2023): 163–69. http://dx.doi.org/10.21609/jiki.v16i2.1165.

Full text

Abstract:

Assessment of cardiac function using echocardiography is an essential and widely used method. Assessment by manually labeling the left ventricle area can generally be time-consuming, error-prone, and has interobserver variability. Thus, automatic delineation of the left ventricle area is necessary so that the assessment can be carried out effectively and efficiently. In this study, encoder-decoder based deep learning model for left ventricle segmentation in echocardiography was developed using the effective CNN U-Net encoder and combined with the deeplabv3+ decoder which has efficient performance and is able to produce sharper and more accurate segmentation results. Furthermore, the Atrous Spatial Pyramid Pooling module were added to the encoder to improve feature extraction. Tested on the Echonet-Dynamic dataset, the proposed model gives better results than the U-Net, DeeplabV3+, and DeeplabV3 models by producing a dice similarity coefficient of 92.87%. The experimental results show that combining the U-Net encoder and DeeplabV3+ decoder is able to provide increased performance compared to previous studies.

APA, Harvard, Vancouver, ISO, and other styles

38

Yuan, Sanyi, Xinqi Jiao, Yaneng Luo, Wenjing Sang, and Shangxu Wang. "Double-scale supervised inversion with a data-driven forward model for low-frequency impedance recovery." GEOPHYSICS 87, no. 2 (December 27, 2021): R165—R181. http://dx.doi.org/10.1190/geo2020-0421.1.

Full text

Abstract:

Low-frequency information is important in reducing the nonuniqueness of absolute impedance inversion and for quantitative seismic interpretation. In traditional model-driven impedance inversion methods, the low-frequency impedance background is from an initial model and is almost unchanged during the inversion process. Moreover, the inversion results are limited by the quality of the modeled seismic data and the extracted wavelet. To alleviate these issues, we have investigated a double-scale supervised impedance inversion method based on the gated recurrent encoder-decoder network (GREDN). We first train the decoder network of GREDN called the forward operator, which can map impedance to seismic data. We then implement the well-trained decoder as a constraint to train the encoder network of GREDN called the inverse operator. Besides matching the output of the encoder with broadband pseudowell impedance labels, data generated by inputting the encoder output into the known decoder match the observed narrowband seismic data. The broadband impedance information and the already-trained decoder largely limit the solution space of the encoder. Finally, after training, only the derived optimal encoder is applied to unseen seismic traces to yield broadband impedance volumes. Our approach is fully data driven and does not involve the initial model, seismic wavelet, and model-driven operator. Tests on the Marmousi model illustrate that our double-scale supervised impedance inversion method can effectively recover low-frequency components of the impedance model, and we determine that low frequencies of the predicted impedance originate from well logs. Furthermore, we apply the strategy of combining the double-scale supervised impedance inversion method with a model-driven impedance inversion method to process field seismic data. Tests on a field data set indicate that the predicted impedance results not only reveal a classic tectonic sedimentation history but also match the corresponding results measured at the locations of two wells.

APA, Harvard, Vancouver, ISO, and other styles

39

Schuster, Viktoria, and Anders Krogh. "A Manifold Learning Perspective on Representation Learning: Learning Decoder and Representations without an Encoder." Entropy 23, no. 11 (October 25, 2021): 1403. http://dx.doi.org/10.3390/e23111403.

Full text

Abstract:

Autoencoders are commonly used in representation learning. They consist of an encoder and a decoder, which provide a straightforward method to map n-dimensional data in input space to a lower m-dimensional representation space and back. The decoder itself defines an m-dimensional manifold in input space. Inspired by manifold learning, we showed that the decoder can be trained on its own by learning the representations of the training samples along with the decoder weights using gradient descent. A sum-of-squares loss then corresponds to optimizing the manifold to have the smallest Euclidean distance to the training samples, and similarly for other loss functions. We derived expressions for the number of samples needed to specify the encoder and decoder and showed that the decoder generally requires much fewer training samples to be well-specified compared to the encoder. We discuss the training of autoencoders in this perspective and relate it to previous work in the field that uses noisy training examples and other types of regularization. On the natural image data sets MNIST and CIFAR10, we demonstrated that the decoder is much better suited to learn a low-dimensional representation, especially when trained on small data sets. Using simulated gene regulatory data, we further showed that the decoder alone leads to better generalization and meaningful representations. Our approach of training the decoder alone facilitates representation learning even on small data sets and can lead to improved training of autoencoders. We hope that the simple analyses presented will also contribute to an improved conceptual understanding of representation learning.

APA, Harvard, Vancouver, ISO, and other styles

40

Zhang, Wei, Shangmin Luan, and Liqin Tian. "A Rapid Combined Model for Automatic Generating Web UI Codes." Wireless Communications and Mobile Computing 2022 (February 8, 2022): 1–10. http://dx.doi.org/10.1155/2022/4415479.

Full text

Abstract:

Encoder-Decoder network is usually applied to image caption to automatically generate descriptive text for a picture. Web user interface (Web UI) is a special type of image and is usually described by HTML (hypertext marked language). Consequently, it becomes possible to use the encoder-decoder network to generate the corresponding code from a screenshot of Web UI. The basic structure of the decoder is RNN, LSTM, GRU, or other recurrent neural networks. However, this kind of decoder needs a long training time, so it increases the time complexity of training and prediction. The HTML language is a typically structured language to describe the Web UI, but it is hard to express the timing characteristics of the word sequence and the complex context. To resolve these problems efficiently, a rapid combined model RCM (rapid combined model) is designed in this paper. The basic structure of the RCM is an encoder-decoder network. The word embedding matrix and visual model are included in the encoder. The word embedding matrix uses fully connected units. Compared with LSTM, the accuracy of the word embedding matrix is basically unchanged, but the training and prediction speed have been significantly improved. In the visual model, the pretrained InceptionV3 network is used to generate the image vector, which not only improves the quality of the recognition of the Web UI interface image but also reduces the training time of the RCM significantly. In the decoder, the word embedding vector and the image vector are integrated together and input into the prediction model for word prediction.

APA, Harvard, Vancouver, ISO, and other styles

41

Escolano, Carlos, Marta Ruiz Costa-jussà, and José A. R. Fonollosa. "Multilingual Machine Translation: Deep Analysis of Language-Specific Encoder-Decoders." Journal of Artificial Intelligence Research 73 (April 25, 2022): 1535–52. http://dx.doi.org/10.1613/jair.1.12699.

Full text

Abstract:

State-of-the-art multilingual machine translation relies on a shared encoder-decoder. In this paper, we propose an alternative approach based on language-specific encoder-decoders, which can be easily extended to new languages by learning their corresponding modules. To establish a common interlingua representation, we simultaneously train N initial languages. Our experiments show that the proposed approach improves over the shared encoder-decoder for the initial languages and when adding new languages, without the need to retrain the remaining modules. All in all, our work closes the gap between shared and language-specific encoder-decoders, advancing toward modular multilingual machine translation systems that can be flexibly extended in lifelong learning settings.

APA, Harvard, Vancouver, ISO, and other styles

42

Geng, Yaogang, Hongyan Mei, Xiaorong Xue, and Xing Zhang. "Image-Caption Model Based on Fusion Feature." Applied Sciences 12, no. 19 (September 30, 2022): 9861. http://dx.doi.org/10.3390/app12199861.

Full text

Abstract:

The encoder–decoder framework is the main frame of image captioning. The convolutional neural network (CNN) is usually used to extract grid-level features of the image, and the graph convolutional neural network (GCN) is used to extract the image’s region-level features. Grid-level features are poor in semantic information, such as the relationship and location of objects, while regional features lack fine-grained information about images. To address this problem, this paper proposes a fusion-features-based image-captioning model, which includes the fusion feature encoder and LSTM decoder. The fusion-feature encoder is divided into grid-level feature encoder and region-level feature encoder. The grid-level feature encoder is a convoluted neural network embedded in squeeze and excitation operations so that the model can focus on features that are highly correlated to the title. The region-level encoder employs node-embedding matrices to enable models to understand different node types and gain richer semantics. Then the features are weighted together by an attention mechanism to guide the decoder LSTM to generate an image caption. Our model was trained and tested in the MS COCO2014 dataset with the experimental evaluation standard Bleu-4 score and CIDEr score of 0.399 and 1.311, respectively. The experimental results indicate that the model can describe the image in detail.

APA, Harvard, Vancouver, ISO, and other styles

43

Ma, Jingjing, Linlin Wu, Xu Tang, Fang Liu, Xiangrong Zhang, and Licheng Jiao. "Building Extraction of Aerial Images by a Global and Multi-Scale Encoder-Decoder Network." Remote Sensing 12, no. 15 (July 22, 2020): 2350. http://dx.doi.org/10.3390/rs12152350.

Full text

Abstract:

Semantic segmentation is an important and challenging task in the aerial image community since it can extract the target level information for understanding the aerial image. As a practical application of aerial image semantic segmentation, building extraction always attracts researchers’ attention as the building is the specific land cover in the aerial images. There are two key points for building extraction from aerial images. One is learning the global and local features to fully describe the buildings with diverse shapes. The other one is mining the multi-scale information to discover the buildings with different resolutions. Taking these two key points into account, we propose a new method named global multi-scale encoder-decoder network (GMEDN) in this paper. Based on the encoder-decoder framework, GMEDN is developed with a local and global encoder and a distilling decoder. The local and global encoder aims at learning the representative features from the aerial images for describing the buildings, while the distilling decoder focuses on exploring the multi-scale information for the final segmentation masks. Combining them together, the building extraction is accomplished in an end-to-end manner. The effectiveness of our method is validated by the experiments counted on two public aerial image datasets. Compared with some existing methods, our model can achieve better performance.

APA, Harvard, Vancouver, ISO, and other styles

44

Yasrab, Robail. "ECRU: An Encoder-Decoder Based Convolution Neural Network (CNN) for Road-Scene Understanding." Journal of Imaging 4, no. 10 (October 8, 2018): 116. http://dx.doi.org/10.3390/jimaging4100116.

Full text

Abstract:

This research presents the idea of a novel fully-Convolutional Neural Network (CNN)-based model for probabilistic pixel-wise segmentation, titled Encoder-decoder-based CNN for Road-Scene Understanding (ECRU). Lately, scene understanding has become an evolving research area, and semantic segmentation is the most recent method for visual recognition. Among vision-based smart systems, the driving assistance system turns out to be a much preferred research topic. The proposed model is an encoder-decoder that performs pixel-wise class predictions. The encoder network is composed of a VGG-19 layer model, while the decoder network uses 16 upsampling and deconvolution units. The encoder of the network has a very flexible architecture that can be altered and trained for any size and resolution of images. The decoder network upsamples and maps the low-resolution encoder’s features. Consequently, there is a substantial reduction in the trainable parameters, as the network recycles the encoder’s pooling indices for pixel-wise classification and segmentation. The proposed model is intended to offer a simplified CNN model with less overhead and higher performance. The network is trained and tested on the famous road scenes dataset CamVid and offers outstanding outcomes in comparison to similar early approaches like FCN and VGG16 in terms of performance vs. trainable parameters.

APA, Harvard, Vancouver, ISO, and other styles

45

Li, Jiangyun, Peng Yao, Longteng Guo, and Weicun Zhang. "Boosted Transformer for Image Captioning." Applied Sciences 9, no. 16 (August 9, 2019): 3260. http://dx.doi.org/10.3390/app9163260.

Full text

Abstract:

Image captioning attempts to generate a description given an image, usually taking Convolutional Neural Network as the encoder to extract the visual features and a sequence model, among which the self-attention mechanism has achieved advanced progress recently, as the decoder to generate descriptions. However, this predominant encoder-decoder architecture has some problems to be solved. On the encoder side, without the semantic concepts, the extracted visual features do not make full use of the image information. On the decoder side, the sequence self-attention only relies on word representations, lacking the guidance of visual information and easily influenced by the language prior. In this paper, we propose a novel boosted transformer model with two attention modules for the above-mentioned problems, i.e., “Concept-Guided Attention” (CGA) and “Vision-Guided Attention” (VGA). Our model utilizes CGA in the encoder, to obtain the boosted visual features by integrating the instance-level concepts into the visual features. In the decoder, we stack VGA, which uses the visual information as a bridge to model internal relationships among the sequences and can be an auxiliary module of sequence self-attention. Quantitative and qualitative results on the Microsoft COCO dataset demonstrate the better performance of our model than the state-of-the-art approaches.

APA, Harvard, Vancouver, ISO, and other styles

46

Megha, Puttaswamy, Belegehalli Siddaiah Premananda, and Nagavika Kamat. "Area and energy optimized Hamming encoder and decoder for nano-communication." Journal of Electrical Engineering 75, no. 3 (June 1, 2024): 229–36. http://dx.doi.org/10.2478/jee-2024-0028.

Full text

Abstract:

Abstract The Hamming code or Linear block code is used in communication to identify and repair errors. Redundancy bits are introduced to the Hamming communication network (HCN) for error detection and correction. It can detect two errors and correct one error. Quantum-dot Cellular Automata (QCA) is used for designing circuits with high switching speed and low energy dissipation. This work proposes a cost-effective QCA-based (7, 4) Hamming encoder and decoder design. Hamming encoder is designed using coplanar structure and the error detector used in Hamming decoder uses a multilayer structure. The effort is to optimize the area, cost, and energy dissipation. The work proposes a coplanar (7, 4) Hamming encoder and decoder. Hamming decoder is implemented in two parts a syndrome calculator and an error detector. Proposed (7, 4) Hamming encoder circuit reduces cell count by 49.47% compared to [1] and 9.52% compared to [12]. The proposed (7, 4) syndrome calculator has reduced cell count by 56.54%, an 11.11% reduction in total area compared to [1]. The proposed design reduces the cell area, QCA cost, and also energy dissipation. The designs are realized and QCA parameters are assessed in QCADesigner2.0.3 and energy is analyzed in QCADesigner-E.

APA, Harvard, Vancouver, ISO, and other styles

47

Zhao, Rui, and Shihong Du. "An Encoder–Decoder with a Residual Network for Fusing Hyperspectral and Panchromatic Remote Sensing Images." Remote Sensing 14, no. 9 (April 20, 2022): 1981. http://dx.doi.org/10.3390/rs14091981.

Full text

Abstract:

For many urban studies it is necessary to obtain remote sensing images with high hyperspectral and spatial resolution by fusing the hyperspectral and panchromatic remote sensing images. In this article, we propose a deep learning model of an encoder–decoder with a residual network (EDRN) for remote sensing image fusion. First, we combined the hyperspectral and panchromatic remote sensing images to circumvent the independence of the hyperspectral and panchromatic image features. Second, we established an encoder–decoder network for extracting representative encoded and decoded deep features. Finally, we established residual networks between the encoder network and the decoder network to enhance the extracted deep features. We evaluated the proposed method on six groups of real-world hyperspectral and panchromatic image datasets, and the experimental results confirmed the superior performance of the proposed method versus six other methods.

APA, Harvard, Vancouver, ISO, and other styles

48

Kim, Hyun-Koo, Kook-Yeol Yoo, and Ho-Youl Jung. "Color Image Generation from LiDAR Reflection Data by Using Selected Connection UNET." Sensors 20, no. 12 (June 15, 2020): 3387. http://dx.doi.org/10.3390/s20123387.

Full text

Abstract:

In this paper, a modified encoder-decoder structured fully convolutional network (ED-FCN) is proposed to generate the camera-like color image from the light detection and ranging (LiDAR) reflection image. Previously, we showed the possibility to generate a color image from a heterogeneous source using the asymmetric ED-FCN. In addition, modified ED-FCNs, i.e., UNET and selected connection UNET (SC-UNET), have been successfully applied to the biomedical image segmentation and concealed-object detection for military purposes, respectively. In this paper, we apply the SC-UNET to generate a color image from a heterogeneous image. Various connections between encoder and decoder are analyzed. The LiDAR reflection image has only 5.28% valid values, i.e., its data are extremely sparse. The severe sparseness of the reflection image limits the generation performance when the UNET is applied directly to this heterogeneous image generation. In this paper, we present a methodology of network connection in SC-UNET that considers the sparseness of each level in the encoder network and the similarity between the same levels of encoder and decoder networks. The simulation results show that the proposed SC-UNET with the connection between encoder and decoder at two lowest levels yields improvements of 3.87 dB and 0.17 in peak signal-to-noise ratio and structural similarity, respectively, over the conventional asymmetric ED-FCN. The methodology presented in this paper would be a powerful tool for generating data from heterogeneous sources.

APA, Harvard, Vancouver, ISO, and other styles

49

Chen, Yu, Ming Yin, Yu Li, and Qian Cai. "CSU-Net: A CNN-Transformer Parallel Network for Multimodal Brain Tumour Segmentation." Electronics 11, no. 14 (July 16, 2022): 2226. http://dx.doi.org/10.3390/electronics11142226.

Full text

Abstract:

Medical image segmentation techniques are vital to medical image processing and analysis. Considering the significant clinical applications of brain tumour image segmentation, it represents a focal point of medical image segmentation research. Most of the work in recent times has been centred on Convolutional Neural Networks (CNN) and Transformers. However, CNN has some deficiencies in modelling long-distance information transfer and contextual processing information, while Transformer is relatively weak in acquiring local information. To overcome the above defects, we propose a novel segmentation network with an “encoder–decoder” architecture, namely CSU-Net. The encoder consists of two parallel feature extraction branches based on CNN and Transformer, respectively, in which the features of the same size are fused. The decoder has a dual Swin Transformer decoder block with two learnable parameters for feature upsampling. The features from multiple resolutions in the encoder and decoder are merged via skip connections. On the BraTS 2020, our model achieves 0.8927, 0.8857, and 0.8188 for the Whole Tumour (WT), Tumour Core (TC), and Enhancing Tumour (ET), respectively, in terms of Dice scores.

APA, Harvard, Vancouver, ISO, and other styles

50

A. Naji, Sinan, and Noha Majeed Saleh. "Digital Image Forgery Detection And Localization Using The Innovated U-Net." Iraqi Journal for Computers and Informatics 50, no. 1 (June 29, 2024): 195–207. http://dx.doi.org/10.25195/ijci.v50i1.484.

Full text

Abstract:

A reliable image copy–move forgery detection approach adaptable to different scenarios of tampering with color images is crucial for many applications. Different methods and solutions have been effectively proposed, but they are still subject to false positive/negative detections and cannot handle the variety of copy–move forgeries. In this paper, a machine learning model that combines ResNet 50 and U-net architectures for automatic image forgery detection in color image(s) is presented. The proposed system is inspired by the ResNet 50 architecture as an encoder and the U-Net architecture as a decoder. The encoder function implies applying convolution and normalizing for feature extraction. Conversely, the decoder functions is locating the spatial features. The decoder in the U-Net network comprises multiple decoder blocks, which are connected to corresponding encoder blocks by employing concatenate layers. A binary mask is then produced to represent the tampered regions in the image. Quantitative experimental results on two standard public datasets and a comparison with state-of-the-art methods demonstrate the effectiveness and robustness of the proposed model.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!