To see the other types of publications on this topic, follow the link: Encoder and decoder feature.

Journal articles on the topic 'Encoder and decoder feature'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Encoder and decoder feature.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Shim, Jae-hun, Hyunwoo Yu, Kyeongbo Kong, and Suk-Ju Kang. "FeedFormer: Revisiting Transformer Decoder for Efficient Semantic Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 2 (2023): 2263–71. http://dx.doi.org/10.1609/aaai.v37i2.25321.

Full text
Abstract:
With the success of Vision Transformer (ViT) in image classification, its variants have yielded great success in many downstream vision tasks. Among those, the semantic segmentation task has also benefited greatly from the advance of ViT variants. However, most studies of the transformer for semantic segmentation only focus on designing efficient transformer encoders, rarely giving attention to designing the decoder. Several studies make attempts in using the transformer decoder as the segmentation decoder with class-wise learnable query. Instead, we aim to directly use the encoder features as
APA, Harvard, Vancouver, ISO, and other styles
2

Wen, Ying, Kai Xie, and Lianghua He. "Segmenting Medical MRI via Recurrent Decoding Cell." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 12452–59. http://dx.doi.org/10.1609/aaai.v34i07.6932.

Full text
Abstract:
The encoder-decoder networks are commonly used in medical image segmentation due to their remarkable performance in hierarchical feature fusion. However, the expanding path for feature decoding and spatial recovery does not consider the long-term dependency when fusing feature maps from different layers, and the universal encoder-decoder network does not make full use of the multi-modality information to improve the network robustness especially for segmenting medical MRI. In this paper, we propose a novel feature fusion unit called Recurrent Decoding Cell (RDC) which leverages convolutional R
APA, Harvard, Vancouver, ISO, and other styles
3

Sun, Jun, Junbo Zhang, Xuesong Gao, et al. "Fusing Spatial Attention with Spectral-Channel Attention Mechanism for Hyperspectral Image Classification via Encoder–Decoder Networks." Remote Sensing 14, no. 9 (2022): 1968. http://dx.doi.org/10.3390/rs14091968.

Full text
Abstract:
In recent years, convolutional neural networks (CNNs) have been widely used in hyperspectral image (HSI) classification. However, feature extraction on hyperspectral data still faces numerous challenges. Existing methods cannot extract spatial and spectral-channel contextual information in a targeted manner. In this paper, we propose an encoder–decoder network that fuses spatial attention and spectral-channel attention for HSI classification from three public HSI datasets to tackle these issues. In terms of feature information fusion, a multi-source attention mechanism including spatial and sp
APA, Harvard, Vancouver, ISO, and other styles
4

Alharbi, Majed, Ahmed Stohy, Mohammed Elhenawy, Mahmoud Masoud, and Hamiden El-Wahed Khalifa. "Solving Traveling Salesman Problem with Time Windows Using Hybrid Pointer Networks with Time Features." Sustainability 13, no. 22 (2021): 12906. http://dx.doi.org/10.3390/su132212906.

Full text
Abstract:
This paper introduces a time efficient deep learning-based solution to the traveling salesman problem with time window (TSPTW). Our goal is to reduce the total tour length traveled by -*the agent without violating any time limitations. This will aid in decreasing the time required to supply any type of service, as well as lowering the emissions produced by automobiles, allowing our planet to recover from air pollution emissions. The proposed model is a variation of the pointer networks that has a better ability to encode the TSPTW problems. The model proposed in this paper is inspired from our
APA, Harvard, Vancouver, ISO, and other styles
5

Ai, Xinbo, Yunhao Xie, Yinan He, and Yi Zhou. "Improve SegNet with feature pyramid for road scene parsing." E3S Web of Conferences 260 (2021): 03012. http://dx.doi.org/10.1051/e3sconf/202126003012.

Full text
Abstract:
Road scene parsing is a common task in semantic segmentation. Its images have characteristics of containing complex scene context and differing greatly among targets of the same category from different scales. To address these problems, we propose a semantic segmentation model combined with edge detection. We extend the segmentation network with an encoder-decoder structure by adding an edge feature pyramid module, namely Edge Feature Pyramid Network (EFPNet, for short). This module uses edge detection operators to get boundary information and then combines the multiscale features to improve t
APA, Harvard, Vancouver, ISO, and other styles
6

Jiang, S. L., G. Li, W. Yao, Z. H. Hong, and T. Y. Kuc. "DUAL PYRAMIDS ENCODER-DECODER NETWORK FOR SEMANTIC SEGMENTATION IN GROUND AND AERIAL VIEW IMAGES." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B2-2020 (August 12, 2020): 605–10. http://dx.doi.org/10.5194/isprs-archives-xliii-b2-2020-605-2020.

Full text
Abstract:
Abstract. Semantic segmentation is a fundamental research task in computer vision, which intends to assign a certain category to every pixel. Currently, most existing methods only utilize the deepest feature map for decoding, while high-level features get inevitably lost during the procedure of down-sampling. In the decoder section, transposed convolution or bilinear interpolation was widely used to restore the size of the encoded feature map; however, few optimizations are applied during up-sampling process which is detrimental to the performance for grouping and classification. In this work,
APA, Harvard, Vancouver, ISO, and other styles
7

Abdulaziz AlArfaj, Abeer, and Hanan Ahmed Hosni Mahmoud. "A Moving Object Tracking Technique Using Few Frames with Feature Map Extraction and Feature Fusion." ISPRS International Journal of Geo-Information 11, no. 7 (2022): 379. http://dx.doi.org/10.3390/ijgi11070379.

Full text
Abstract:
Moving object tracking techniques using machine and deep learning require large datasets for neural model training. New strategies need to be invented that utilize smaller data training sizes to realize the impact of large-sized datasets. However, current research does not balance the training data size and neural parameters, which creates the problem of inadequacy of the information provided by the low visual data content for parameter optimization. To enhance the performance of moving object tracking that appears in only a few frames, this research proposes a deep learning model using an abu
APA, Harvard, Vancouver, ISO, and other styles
8

Wang, Hongquan, Xinshan Zhu, Chao Ren, Lan Zhang, and Shugen Ma. "A Frequency Attention-Based Dual-Stream Network for Image Inpainting Forensics." Mathematics 11, no. 12 (2023): 2593. http://dx.doi.org/10.3390/math11122593.

Full text
Abstract:
The rapid development of digital image inpainting technology is causing serious hidden danger to the security of multimedia information. In this paper, a deep network called frequency attention-based dual-stream network (FADS-Net) is proposed for locating the inpainting region. FADS-Net is established by a dual-stream encoder and an attention-based blue-associative decoder. The dual-stream encoder includes two feature extraction streams, the raw input stream (RIS) and the frequency recalibration stream (FRS). RIS directly captures feature maps from the raw input, while FRS performs feature ext
APA, Harvard, Vancouver, ISO, and other styles
9

Li, Xin, Feng Xu, Runliang Xia, et al. "Encoding Contextual Information by Interlacing Transformer and Convolution for Remote Sensing Imagery Semantic Segmentation." Remote Sensing 14, no. 16 (2022): 4065. http://dx.doi.org/10.3390/rs14164065.

Full text
Abstract:
Contextual information plays a pivotal role in the semantic segmentation of remote sensing imagery (RSI) due to the imbalanced distributions and ubiquitous intra-class variants. The emergence of the transformer intrigues the revolution of vision tasks with its impressive scalability in establishing long-range dependencies. However, the local patterns, such as inherent structures and spatial details, are broken with the tokenization of the transformer. Therefore, the ICTNet is devised to confront the deficiencies mentioned above. Principally, ICTNet inherits the encoder–decoder architecture. Fi
APA, Harvard, Vancouver, ISO, and other styles
10

Geng, Yaogang, Hongyan Mei, Xiaorong Xue, and Xing Zhang. "Image-Caption Model Based on Fusion Feature." Applied Sciences 12, no. 19 (2022): 9861. http://dx.doi.org/10.3390/app12199861.

Full text
Abstract:
The encoder–decoder framework is the main frame of image captioning. The convolutional neural network (CNN) is usually used to extract grid-level features of the image, and the graph convolutional neural network (GCN) is used to extract the image’s region-level features. Grid-level features are poor in semantic information, such as the relationship and location of objects, while regional features lack fine-grained information about images. To address this problem, this paper proposes a fusion-features-based image-captioning model, which includes the fusion feature encoder and LSTM decoder. The
APA, Harvard, Vancouver, ISO, and other styles
11

Bai, Xiaowei, Yonghong Zhang, and Jujie Wei. "LGFUNet: A Water Extraction Network in SAR Images Based on Multiscale Local Features with Global Information." Sensors 25, no. 12 (2025): 3814. https://doi.org/10.3390/s25123814.

Full text
Abstract:
To address existing issues in water extraction from SAR images based on deep learning, such as confusion between mountain shadows and water bodies and difficulty in extracting complex boundary details for continuous water bodies, the LGFUNet model is proposed. The LGFUNet model consists of three parts: the encoder–decoder, the DECASPP module, and the LGFF module. In the encoder–decoder, the Swin-Transformer module is used instead of convolution kernels for feature extraction, enhancing the learning of global information and improving the model’s ability to capture the spatial features of conti
APA, Harvard, Vancouver, ISO, and other styles
12

Ma, Shangchen, and Chunlin Song. "Semi-Supervised Drivable Road Segmentation with Expanded Feature Cross-Consistency." Applied Sciences 13, no. 21 (2023): 12036. http://dx.doi.org/10.3390/app132112036.

Full text
Abstract:
Drivable road segmentation aims to sense the surrounding environment to keep vehicles within safe road boundaries, which is fundamental in Advance Driver-Assistance Systems (ADASs). Existing deep learning-based supervised methods are able to achieve good performance in this field with large amounts of human-labeled training data. However, the process of collecting sufficient fine human-labeled data is extremely time-consuming and expensive. To fill this gap, in this paper, we innovatively propose a general yet effective semi-supervised method for drivable road segmentation with lower labeled-d
APA, Harvard, Vancouver, ISO, and other styles
13

Zhao, Rui, and Shihong Du. "An Encoder–Decoder with a Residual Network for Fusing Hyperspectral and Panchromatic Remote Sensing Images." Remote Sensing 14, no. 9 (2022): 1981. http://dx.doi.org/10.3390/rs14091981.

Full text
Abstract:
For many urban studies it is necessary to obtain remote sensing images with high hyperspectral and spatial resolution by fusing the hyperspectral and panchromatic remote sensing images. In this article, we propose a deep learning model of an encoder–decoder with a residual network (EDRN) for remote sensing image fusion. First, we combined the hyperspectral and panchromatic remote sensing images to circumvent the independence of the hyperspectral and panchromatic image features. Second, we established an encoder–decoder network for extracting representative encoded and decoded deep features. Fi
APA, Harvard, Vancouver, ISO, and other styles
14

Jiang, DingLin, Xinwei Luo, and Qifan Shen. "Frequency line detection in spectrograms using a deep neural network with attention." Journal of the Acoustical Society of America 156, no. 5 (2024): 3204–16. http://dx.doi.org/10.1121/10.0034360.

Full text
Abstract:
In this paper, a frequency line detection network (FLDNet) is proposed to effectively detect multiple weak frequency lines and time-varying frequency lines in underwater acoustic signals under low signal-to-noise ratios (SNRs). FLDNet adopts an encoder-decoder architecture as the basic framework, where the encoder is designed to obtain multilevel features of the frequency lines, and the decoder is responsible for reconstructing the frequency lines. FLDNet includes attention-based feature fusion modules that combine deep semantic features with shallow features learned by the encoder to reduce n
APA, Harvard, Vancouver, ISO, and other styles
15

Shi, Hongwei, Shiqi Wu, Minghao Ye, and Changda Ma. "A speech separation model improved based on Conv-TasNet network." Journal of Physics: Conference Series 2858, no. 1 (2024): 012033. http://dx.doi.org/10.1088/1742-6596/2858/1/012033.

Full text
Abstract:
Abstract In the field of single-channel speech separation, the extraction and separation of features from mixed audio have always been the focus and difficulty of research. Currently, mainstream methods mainly suffer from poor generalization ability and issues such as inadequate feature extraction, which leads to the models’ inferior separation capability. This paper proposes an improved DConv-TasNet network model, focusing on the optimization of the encoder/decoder modules and separation modules and utilizing deep dilated encoders/decoders to extract features from mixed speech signals. It enh
APA, Harvard, Vancouver, ISO, and other styles
16

Lan, Meng, Jing Zhang, Fengxiang He, and Lefei Zhang. "Siamese Network with Interactive Transformer for Video Object Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (2022): 1228–36. http://dx.doi.org/10.1609/aaai.v36i2.20009.

Full text
Abstract:
Semi-supervised video object segmentation (VOS) refers to segmenting the target object in remaining frames given its annotation in the first frame, which has been actively studied in recent years. The key challenge lies in finding effective ways to exploit the spatio-temporal context of past frames to help learn discriminative target representation of current frame. In this paper, we propose a novel Siamese network with a specifically designed interactive transformer, called SITVOS, to enable effective context propagation from historical to current frames. Technically, we use the transformer e
APA, Harvard, Vancouver, ISO, and other styles
17

Sharma, Neha, Sheifali Gupta, Mana Saleh Al Reshan, Adel Sulaiman, Hani Alshahrani, and Asadullah Shaikh. "EfficientNetB0 cum FPN Based Semantic Segmentation of Gastrointestinal Tract Organs in MRI Scans." Diagnostics 13, no. 14 (2023): 2399. http://dx.doi.org/10.3390/diagnostics13142399.

Full text
Abstract:
The segmentation of gastrointestinal (GI) organs is crucial in radiation therapy for treating GI cancer. It allows for developing a targeted radiation therapy plan while minimizing radiation exposure to healthy tissue, improving treatment success, and decreasing side effects. Medical diagnostics in GI tract organ segmentation is essential for accurate disease detection, precise differential diagnosis, optimal treatment planning, and efficient disease monitoring. This research presents a hybrid encoder–decoder-based model for segmenting healthy organs in the GI tract in biomedical images of can
APA, Harvard, Vancouver, ISO, and other styles
18

Wang, Guixian, Dandan Huang, ZhenYe Geng, Zhi Liu, and Jin Duan. "A Novel Encoder-Decoder Structure-based Transformer for Fine-Resolution Remote Sensing Images." Journal of Physics: Conference Series 2517, no. 1 (2023): 012017. http://dx.doi.org/10.1088/1742-6596/2517/1/012017.

Full text
Abstract:
Abstract Full convolution neural network (FCN) based on an encoder-decoder structure has become a standard network in the semantic segmentation domain. Encoder-decoder architecture is an effective means to get finer-grained performance. Encoders constantly extract multilevel features, and then use decoders to gradually introduce low-level features into high-level features. Context information is critical for accurate segmentation, which is the main direction of semantic segmentation at present. So many efforts have been made to make better use of this kind of information, including codec struc
APA, Harvard, Vancouver, ISO, and other styles
19

He, Haiqing, Yan Wei, Fuyang Zhou, and Hai Zhang. "A Deep Neural Network for Road Extraction with the Capability to Remove Foreign Objects with Similar Spectra." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-1-2024 (May 10, 2024): 193–99. http://dx.doi.org/10.5194/isprs-archives-xlviii-1-2024-193-2024.

Full text
Abstract:
Abstract. Existing road extraction methods based on deep learning often struggle with distinguishing ground objects that share similar spectral information, such as roads and buildings. Consequently, this study proposes a dual encoder-decoder deep neural network to address road extraction in complex backgrounds. In the feature extraction stage, the first encoder-decoder designed for extracting road features. The second encoder-decoder utilized for extracting building features. During the feature fusion stage, road features and building features are integrated using a subtraction method. The re
APA, Harvard, Vancouver, ISO, and other styles
20

Li, Hao, Sha Cao, Siyu Jiang, and Tongyang Pan. "Residual Dual Encoder Network using Distance Metric Learning for Intelligent Fault Recognition with Unknown Classes." Journal of Physics: Conference Series 2999, no. 1 (2025): 012004. https://doi.org/10.1088/1742-6596/2999/1/012004.

Full text
Abstract:
Abstract The paper proposes a residual dual encoder network using distance metric learning for intelligent fault recognition with unknown classes. The network is made up of two encoders and one decoder. In both the encoders and the decoder, residual blocks are used as the main structure for deep feature extraction. Besides, distance metric learning with triplet loss is used to train the residual dual encoder network to obtain features which could represent different health conditions. Benefiting from the metric learning principle, the proposed model could recognize the potential faults in mech
APA, Harvard, Vancouver, ISO, and other styles
21

Li, Weisheng, Minghao Xiang, and Xuesong Liang. "A Dense Encoder–Decoder Network with Feedback Connections for Pan-Sharpening." Remote Sensing 13, no. 22 (2021): 4505. http://dx.doi.org/10.3390/rs13224505.

Full text
Abstract:
To meet the need for multispectral images having high spatial resolution in practical applications, we propose a dense encoder–decoder network with feedback connections for pan-sharpening. Our network consists of four parts. The first part consists of two identical subnetworks, one each to extract features from PAN and MS images, respectively. The second part is an efficient feature-extraction block. We hope that the network can focus on features at different scales, so we propose innovative multiscale feature-extraction blocks that fully extract effective features from networks of various dep
APA, Harvard, Vancouver, ISO, and other styles
22

Wang, Yiyu, Jungang Xu, and Yingfei Sun. "End-to-End Transformer Based Model for Image Captioning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (2022): 2585–94. http://dx.doi.org/10.1609/aaai.v36i3.20160.

Full text
Abstract:
CNN-LSTM based architectures have played an important role in image captioning, but limited by the training efficiency and expression ability, researchers began to explore the CNN-Transformer based models and achieved great success. Meanwhile, almost all recent works adopt Faster R-CNN as the backbone encoder to extract region-level features from given images. However, Faster R-CNN needs a pre-training on an additional dataset, which divides the image captioning task into two stages and limits its potential applications. In this paper, we build a pure Transformer-based model, which integrates
APA, Harvard, Vancouver, ISO, and other styles
23

Li, Zhong, Hongyi Wang, Qi Han, et al. "Convolutional Neural Network with Multiscale Fusion and Attention Mechanism for Skin Diseases Assisted Diagnosis." Computational Intelligence and Neuroscience 2022 (June 14, 2022): 1–10. http://dx.doi.org/10.1155/2022/8390997.

Full text
Abstract:
Melanoma segmentation based on a convolutional neural network (CNN) has recently attracted extensive attention. However, the features captured by CNN are always local that result in discontinuous feature extraction. To solve this problem, we propose a novel multiscale feature fusion network (MSFA-Net). MSFA-Net can extract feature information at different scales through a multiscale feature fusion structure (MSF) in the network and then calibrate and restore the extracted information to achieve the purpose of melanoma segmentation. Specifically, based on the popular encoder-decoder structure,
APA, Harvard, Vancouver, ISO, and other styles
24

Javaloy, Adrián, and Ginés García-Mateos. "Text Normalization Using Encoder–Decoder Networks Based on the Causal Feature Extractor." Applied Sciences 10, no. 13 (2020): 4551. http://dx.doi.org/10.3390/app10134551.

Full text
Abstract:
The encoder–decoder architecture is a well-established, effective and widely used approach in many tasks of natural language processing (NLP), among other domains. It consists of two closely-collaborating components: An encoder that transforms the input into an intermediate form, and a decoder producing the output. This paper proposes a new method for the encoder, named Causal Feature Extractor (CFE), based on three main ideas: Causal convolutions, dilatations and bidirectionality. We apply this method to text normalization, which is a ubiquitous problem that appears as the first step of many
APA, Harvard, Vancouver, ISO, and other styles
25

Nguyen, Quoc Toan. "Defective sewing stitch semantic segmentation using DeeplabV3+ and EfficientNet." Inteligencia Artificial 25, no. 70 (2022): 64–76. http://dx.doi.org/10.4114/intartif.vol25iss70pp64-76.

Full text
Abstract:
Defective stitch inspection is an essential part of garment manufacturing quality assurance. Traditional mechanical defect detection systems are effective, but they are usually customized with handcrafted features that must be operated by a human. Deep learning approaches have recently demonstrated exceptional performance in a wide range of computer vision applications. The requirement for precise detail evaluation, combined with the small size of the patterns, undoubtedly increases the difficulty of identification. Therefore, image segmentation (semantic segmentation) was employed for this ta
APA, Harvard, Vancouver, ISO, and other styles
26

Chen, Yu, Ming Yin, Yu Li, and Qian Cai. "CSU-Net: A CNN-Transformer Parallel Network for Multimodal Brain Tumour Segmentation." Electronics 11, no. 14 (2022): 2226. http://dx.doi.org/10.3390/electronics11142226.

Full text
Abstract:
Medical image segmentation techniques are vital to medical image processing and analysis. Considering the significant clinical applications of brain tumour image segmentation, it represents a focal point of medical image segmentation research. Most of the work in recent times has been centred on Convolutional Neural Networks (CNN) and Transformers. However, CNN has some deficiencies in modelling long-distance information transfer and contextual processing information, while Transformer is relatively weak in acquiring local information. To overcome the above defects, we propose a novel segmenta
APA, Harvard, Vancouver, ISO, and other styles
27

Zhai, Cong, Liejun Wang, and Jian Yuan. "New Fusion Network with Dual-Branch Encoder and Triple-Branch Decoder for Remote Sensing Image Change Detection." Applied Sciences 13, no. 10 (2023): 6167. http://dx.doi.org/10.3390/app13106167.

Full text
Abstract:
Deep learning plays a highly essential role in the domain of remote sensing change detection (CD) due to its high efficiency. From some existing methods, we can observe that the fusion of information at each scale is quite vital for the accuracy of the CD results, especially for the common problems of pseudo-change and the difficult detection of change edges in the CD task. With this in mind, we propose a New Fusion network with Dual-branch Encoder and Triple-branch Decoder (DETDNet) that follows a codec structure as a whole, where the encoder adopts a siamese Res2Net-50 structure to extract t
APA, Harvard, Vancouver, ISO, and other styles
28

Liu, Song, Haiwei Li, Feifei Wang, et al. "Unsupervised Transformer Boundary Autoencoder Network for Hyperspectral Image Change Detection." Remote Sensing 15, no. 7 (2023): 1868. http://dx.doi.org/10.3390/rs15071868.

Full text
Abstract:
In the field of remote sens., change detection is an important monitoring technology. However, effectively extracting the change feature is still a challenge, especially with an unsupervised method. To solve this problem, we proposed an unsupervised transformer boundary autoencoder network (UTBANet) in this paper. UTBANet consists of a transformer structure and spectral attention in the encoder part. In addition to reconstructing hyperspectral images, UTBANet also adds a decoder branch for reconstructing edge information. The designed encoder module is used to extract features. First, the tran
APA, Harvard, Vancouver, ISO, and other styles
29

Fang, Han, Yupeng Qiu, Kejiang Chen, Jiyi Zhang, Weiming Zhang, and Ee-Chien Chang. "Flow-Based Robust Watermarking with Invertible Noise Layer for Black-Box Distortions." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 4 (2023): 5054–61. http://dx.doi.org/10.1609/aaai.v37i4.25633.

Full text
Abstract:
Deep learning-based digital watermarking frameworks have been widely studied recently. Most existing methods adopt an ``encoder-noise layer-decoder''-based architecture where the embedding and extraction processes are accomplished separately by the encoder and the decoder. However, one potential drawback of such a framework is that the encoder and the decoder may not be well coupled, resulting in the fact that the encoder may embed some redundant features into the host image thus influencing the invisibility and robustness of the whole algorithm. To address this limitation, this paper proposes
APA, Harvard, Vancouver, ISO, and other styles
30

Zhang, Yusha, and Xiongliang Xiao. "A Dynamic Community Detection Method for Complex Networks Based on Deep Self-Coding Network." Computational Intelligence and Neuroscience 2022 (July 31, 2022): 1–9. http://dx.doi.org/10.1155/2022/7084084.

Full text
Abstract:
Aiming at the problem of community detection in complex dynamic networks, a dynamic community detection method based on graph convolution neural network is proposed. An encoding-decoding mechanism is designed to reconstruct the feature information of each node in the graph. A stack of multiple graph convolutional layers is considered as an encoder that encodes the node feature information into the potential vector space, while the decoder employs a simple two-layer perceptron to reconstruct the initial node features from the encoded vector information. The encoding-decoding mechanism achieves
APA, Harvard, Vancouver, ISO, and other styles
31

Lei, Zhi, Guixian Zhang, Lijuan Wu, Kui Zhang, and Rongjiao Liang. "A Multi-level Mesh Mutual Attention Model for Visual Question Answering." Data Science and Engineering 7, no. 4 (2022): 339–53. http://dx.doi.org/10.1007/s41019-022-00200-9.

Full text
Abstract:
AbstractVisual question answering is a complex multimodal task involving images and text, with broad application prospects in human–computer interaction and medical assistance. Therefore, how to deal with the feature interaction and multimodal feature fusion between the critical regions in the image and the keywords in the question is an important issue. To this end, we propose a neural network based on the encoder–decoder structure of the transformer architecture. Specifically, in the encoder, we use multi-head self-attention to mine word–word connections within question features and stack mu
APA, Harvard, Vancouver, ISO, and other styles
32

Liu, C., Y. Zhang, and Y. Ou. "COMPONENT SUBSTITUTION NETWORK FOR PAN-SHARPENING VIA SEMI-SUPERVISED LEARNING." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences V-3-2020 (August 3, 2020): 255–62. http://dx.doi.org/10.5194/isprs-annals-v-3-2020-255-2020.

Full text
Abstract:
Abstract. Pan-sharpening refers to the technology which fuses a low resolution multispectral image (MS) and a high resolution panchromatic (PAN) image into a high resolution multispectral image (HRMS). In this paper, we propose a Component Substitution Network (CSN) for pan-sharpening. By adding a feature exchange module (FEM) to the widely used encoder-decoder framework, we design a network following the general procedure of the traditional component substitution (CS) approaches. Encoder of the network decomposes the input image into spectral feature and structure feature. The FEM regroups th
APA, Harvard, Vancouver, ISO, and other styles
33

Khanh, Trinh Le Ba, Duy-Phuong Dao, Ngoc-Huynh Ho, et al. "Enhancing U-Net with Spatial-Channel Attention Gate for Abnormal Tissue Segmentation in Medical Imaging." Applied Sciences 10, no. 17 (2020): 5729. http://dx.doi.org/10.3390/app10175729.

Full text
Abstract:
In recent years, deep learning has dominated medical image segmentation. Encoder-decoder architectures, such as U-Net, can be used in state-of-the-art models with powerful designs that are achieved by implementing skip connections that propagate local information from an encoder path to a decoder path to retrieve detailed spatial information lost by pooling operations. Despite their effectiveness for segmentation, these naïve skip connections still have some disadvantages. First, multi-scale skip connections tend to use unnecessary information and computational sources, where likable low-level
APA, Harvard, Vancouver, ISO, and other styles
34

Zheng, Chuanpan, Xiaoliang Fan, Cheng Wang, and Jianzhong Qi. "GMAN: A Graph Multi-Attention Network for Traffic Prediction." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (2020): 1234–41. http://dx.doi.org/10.1609/aaai.v34i01.5477.

Full text
Abstract:
Long-term traffic prediction is highly challenging due to the complexity of traffic systems and the constantly changing nature of many impacting factors. In this paper, we focus on the spatio-temporal factors, and propose a graph multi-attention network (GMAN) to predict traffic conditions for time steps ahead at different locations on a road network graph. GMAN adapts an encoder-decoder architecture, where both the encoder and the decoder consist of multiple spatio-temporal attention blocks to model the impact of the spatio-temporal factors on traffic conditions. The encoder encodes the input
APA, Harvard, Vancouver, ISO, and other styles
35

Li, Rumei, Liyan Zhang, Zun Wang, and Xiaojuan Li. "FCSwinU: Fourier Convolutions and Swin Transformer UNet for Hyperspectral and Multispectral Image Fusion." Sensors 24, no. 21 (2024): 7023. http://dx.doi.org/10.3390/s24217023.

Full text
Abstract:
The fusion of low-resolution hyperspectral images (LR-HSI) with high-resolution multispectral images (HR-MSI) provides a cost-effective approach to obtaining high-resolution hyperspectral images (HR-HSI). Existing methods primarily based on convolutional neural networks (CNNs) struggle to capture global features and do not adequately address the significant scale and spectral resolution differences between LR-HSI and HR-MSI. To tackle these challenges, our novel FCSwinU network leverages the spectral fast Fourier convolution (SFFC) module for spectral feature extraction and utilizes the Swin T
APA, Harvard, Vancouver, ISO, and other styles
36

Yang, Yong, Wenzhi Xu, Shuying Huang, and Weiguo Wan. "Low-Light Image Enhancement Network Based on Multi-Scale Feature Complementation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 3 (2023): 3214–21. http://dx.doi.org/10.1609/aaai.v37i3.25427.

Full text
Abstract:
Images captured in low-light environments have problems of insufficient brightness and low contrast, which will affect subsequent image processing tasks. Although most current enhancement methods can obtain high-contrast images, they still suffer from noise amplification and color distortion. To address these issues, this paper proposes a low-light image enhancement network based on multi-scale feature complementation (LIEN-MFC), which is a U-shaped encoder-decoder network supervised by multiple images of different scales. In the encoder, four feature extraction branches are constructed to ext
APA, Harvard, Vancouver, ISO, and other styles
37

Li, Jianyong, Ge Gao, Lei Yang, Yanhong Liu, and Hongnian Yu. "DEF-Net: A Dual-Encoder Fusion Network for Fundus Retinal Vessel Segmentation." Electronics 11, no. 22 (2022): 3810. http://dx.doi.org/10.3390/electronics11223810.

Full text
Abstract:
The deterioration of numerous eye diseases is highly related to the fundus retinal structures, so the automatic retinal vessel segmentation serves as an essential stage for efficient detection of eye-related lesions in clinical practice. Segmentation methods based on encode-decode structures exhibit great potential in retinal vessel segmentation tasks, but have limited feature representation ability. In addition, they don’t effectively consider the information at multiple scales when performing feature fusion, resulting in low fusion efficiency. In this paper, a newly model, named DEF-Net, is
APA, Harvard, Vancouver, ISO, and other styles
38

Jiang, Ligang, Wen Li, Zhiming Xiong, et al. "Retinal Vessel Segmentation Based on Self-Attention Feature Selection." Electronics 13, no. 17 (2024): 3514. http://dx.doi.org/10.3390/electronics13173514.

Full text
Abstract:
Many major diseases can cause changes in the morphology of blood vessels, and the segmentation of retinal blood vessels is of great significance for preventing these diseases. Obtaining complete, continuous, and high-resolution segmentation results is very challenging due to the diverse structures of retinal tissues, the complex spatial structures of blood vessels, and the presence of many small ships. In recent years, deep learning networks like UNet have been widely used in medical image processing. However, the continuous down-sampling operations in UNet can result in the loss of a signific
APA, Harvard, Vancouver, ISO, and other styles
39

Park, Min-Hong, Jae-Hoon Cho, and Yong-Tae Kim. "CNN Model with Multilayer ASPP and Two-Step Cross-Stage for Semantic Segmentation." Machines 11, no. 2 (2023): 126. http://dx.doi.org/10.3390/machines11020126.

Full text
Abstract:
Currently, interest in deep learning-based semantic segmentation is increasing in various fields such as the medical field, automatic operation, and object division. For example, UNet, a deep learning network with an encoder–decoder structure, is used for image segmentation in the biomedical area, and an attempt to segment various objects is made using ASPP such as Deeplab. A recent study improves the accuracy of object segmentation through structures that extend in various receptive fields. Semantic segmentation has evolved to divide objects of various sizes more accurately and in detail, and
APA, Harvard, Vancouver, ISO, and other styles
40

Sun, Nan, Han Fang, Yuxing Lu, Chengxin Zhao, and Hefei Ling. "END^2: Robust Dual-Decoder Watermarking Framework Against Non-Differentiable Distortions." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 1 (2025): 773–81. https://doi.org/10.1609/aaai.v39i1.32060.

Full text
Abstract:
DNN-based watermarking methods have rapidly advanced, with the ``Encoder-Noise Layer-Decoder'' (END) framework being the most widely used. To ensure end-to-end training, the noise layer in the framework must be differentiable. However, real-world distortions are often non-differentiable, leading to challenges in end-to-end training. Existing solutions only treat the distortion perturbation as additive noise, which does not fully integrate the effect of distortion in training. To better incorporate non-differentiable distortions into training, we propose a novel dual-decoder architecture (END^2
APA, Harvard, Vancouver, ISO, and other styles
41

Shi, Yanli, and Pengpeng Sheng. "J-Net: Asymmetric Encoder-Decoder for Medical Semantic Segmentation." Security and Communication Networks 2021 (August 30, 2021): 1–8. http://dx.doi.org/10.1155/2021/2139024.

Full text
Abstract:
With the development of deep learning, breakthroughs have been made in the field of semantic segmentation. However, it is difficult to generate a fine mask on the same medical images because medical images have low contrast, high resolution, and insufficient semantic information. In most scenarios, existing approaches mostly use a pooling layer to reduce the resolution of feature maps. Therefore, it is difficult for them to consider the whole image features, resulting in information loss and performance degradation. In this paper, a multiscale asymmetric encoder-decoder semantic segmentation n
APA, Harvard, Vancouver, ISO, and other styles
42

Masood, Sharjeel, Fawad Ahmed, Suliman A. Alsuhibany, et al. "A Deep Learning-Based Semantic Segmentation Architecture for Autonomous Driving Applications." Wireless Communications and Mobile Computing 2022 (June 18, 2022): 1–12. http://dx.doi.org/10.1155/2022/8684138.

Full text
Abstract:
In recent years, the development of smart transportation has accelerated research on semantic segmentation as it is one of the most important problems in this area. A large receptive field has always been the center of focus when designing convolutional neural networks for semantic segmentation. A majority of recent techniques have used maxpooling to increase the receptive field of a network at an expense of decreasing its spatial resolution. Although this idea has shown improved results in object detection applications, however, when it comes to semantic segmentation, a high spatial resolutio
APA, Harvard, Vancouver, ISO, and other styles
43

Geng, Xiaoxiao, Shunping Ji, Meng Lu, and Lingli Zhao. "Multi-Scale Attentive Aggregation for LiDAR Point Cloud Segmentation." Remote Sensing 13, no. 4 (2021): 691. http://dx.doi.org/10.3390/rs13040691.

Full text
Abstract:
Semantic segmentation of LiDAR point clouds has implications in self-driving, robots, and augmented reality, among others. In this paper, we propose a Multi-Scale Attentive Aggregation Network (MSAAN) to achieve the global consistency of point cloud feature representation and super segmentation performance. First, upon a baseline encoder-decoder architecture for point cloud segmentation, namely, RandLA-Net, an attentive skip connection was proposed to replace the commonly used concatenation to balance the encoder and decoder features of the same scales. Second, a channel attentive enhancement
APA, Harvard, Vancouver, ISO, and other styles
44

Lu, Xuwei, Yunlong Zhang, and Congqi Zhang. "CATransU-Net: Cross-attention TransU-Net for field rice pest detection." PLOS One 20, no. 6 (2025): e0326893. https://doi.org/10.1371/journal.pone.0326893.

Full text
Abstract:
Accurate detection of rice pests in field is a key problem in field pest control. U-Net can effectively extract local image features, and Transformer is good at dealing with long-distance dependencies. A Cross-Attention TransU-Net (CATransU-Net) model is constructed for paddy pest detection by combining U-Net and Transformer. It consists of encoder, decoder, dual Transformer-attention module (DTA) and cross-attention skip-connection (CASC), where dilated residual Inception (DRI) in encoder is adopted to extract the multiscale features, DTA is added into the bottleneck of the model to efficient
APA, Harvard, Vancouver, ISO, and other styles
45

Li, Boliang, Yaming Xu, Yan Wang, and Bo Zhang. "DECTNet: Dual Encoder Network combined convolution and Transformer architecture for medical image segmentation." PLOS ONE 19, no. 4 (2024): e0301019. http://dx.doi.org/10.1371/journal.pone.0301019.

Full text
Abstract:
Automatic and accurate segmentation of medical images plays an essential role in disease diagnosis and treatment planning. Convolution neural networks have achieved remarkable results in medical image segmentation in the past decade. Meanwhile, deep learning models based on Transformer architecture also succeeded tremendously in this domain. However, due to the ambiguity of the medical image boundary and the high complexity of physical organization structures, implementing effective structure extraction and accurate segmentation remains a problem requiring a solution. In this paper, we propose
APA, Harvard, Vancouver, ISO, and other styles
46

Chen, Qian, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, and Hongwei Du. "RGB-D Salient Object Detection via 3D Convolutional Neural Networks." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (2021): 1063–71. http://dx.doi.org/10.1609/aaai.v35i2.16191.

Full text
Abstract:
RGB-D salient object detection (SOD) recently has attracted increasing research interest and many deep learning methods based on encoder-decoder architectures have emerged. However, most existing RGB-D SOD models conduct feature fusion either in the single encoder or the decoder stage, which hardly guarantees sufficient cross-modal fusion ability. In this paper, we make the first attempt in addressing RGB-D SOD through 3D convolutional neural networks. The proposed model, named RD3D, aims at pre-fusion in the encoder stage and in-depth fusion in the decoder stage to effectively promote the ful
APA, Harvard, Vancouver, ISO, and other styles
47

Xing, Na, Jun Wang, Yuehai Wang, Keqing Ning, and Fuqiang Chen. "Point Cloud Completion Based on Nonlocal Neural Networks with Adaptive Sampling." Information Technology and Control 53, no. 1 (2024): 160–70. http://dx.doi.org/10.5755/j01.itc.53.1.34047.

Full text
Abstract:
Raw point clouds are usually sparse and incomplete, inevitably containing outliers or noise from 3D sensors. In this paper, an improved SA-Net based on an encoder-decoder structure is proposed to make it more robust in predicting complete point clouds. The encoder of the original SA-Net network is very sensitive to noise in the feature extraction process. Therefore, we use PointASNL as the encoder, which weights around the initial sampling points through the AS module (Adaptive Sampling Module) and adaptively adjusts the weight of the sampling points to effectively alleviate the bias effect of
APA, Harvard, Vancouver, ISO, and other styles
48

Xing, Yongfeng, Luo Zhong, and Xian Zhong. "An Encoder-Decoder Network Based FCN Architecture for Semantic Segmentation." Wireless Communications and Mobile Computing 2020 (July 7, 2020): 1–9. http://dx.doi.org/10.1155/2020/8861886.

Full text
Abstract:
In recent years, the convolutional neural network (CNN) has made remarkable achievements in semantic segmentation. The method of semantic segmentation has a desirable application prospect. Nowadays, the methods mostly use an encoder-decoder architecture as a way of generating pixel by pixel segmentation prediction. The encoder is for extracting feature maps and decoder for recovering feature map resolution. An improved semantic segmentation method on the basis of the encoder-decoder architecture is proposed. We can get better segmentation accuracy on several hard classes and reduce the computa
APA, Harvard, Vancouver, ISO, and other styles
49

Jing, Zhenping. "A novel deep fully convolutional encoder-decoder network and similarity analysis for English education text event clustering analysis." Computer Science and Information Systems, no. 00 (2024): 62. http://dx.doi.org/10.2298/csis240418062j.

Full text
Abstract:
Education event clustering for social media aims to achieve short text clustering according to event characteristics in online social networks. Traditional text event clustering has the problem of poor classification results and large computation. Therefore, we propose a novel deep fully convolutional encoder-decoder network and similarity analysis for English education text event clustering analysis in online social networks. At the encoder end, the features of text events are extracted step by step through the convolution operation of the convolution layer. The background noise is suppressed
APA, Harvard, Vancouver, ISO, and other styles
50

Chen, Yunfan, and Hyunchul Shin. "Pedestrian Detection at Night in Infrared Images Using an Attention-Guided Encoder-Decoder Convolutional Neural Network." Applied Sciences 10, no. 3 (2020): 809. http://dx.doi.org/10.3390/app10030809.

Full text
Abstract:
Pedestrian-related accidents are much more likely to occur during nighttime when visible (VI) cameras are much less effective. Unlike VI cameras, infrared (IR) cameras can work in total darkness. However, IR images have several drawbacks, such as low-resolution, noise, and thermal energy characteristics that can differ depending on the weather. To overcome these drawbacks, we propose an IR camera system to identify pedestrians at night that uses a novel attention-guided encoder-decoder convolutional neural network (AED-CNN). In AED-CNN, encoder-decoder modules are introduced to generate multi-
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!