Log in

Relevant bibliographies by topics / Depth-wise convolution / Journal articles

To see the other types of publications on this topic, follow the link: Depth-wise convolution.

Journal articles on the topic 'Depth-wise convolution'

Author: Grafiati

Published: 4 June 2021

Last updated: 6 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Depth-wise convolution.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Hossain, Syed Mohammad Minhaz, Kaushik Deb, Pranab Kumar Dhar, and Takeshi Koshiba. "Plant Leaf Disease Recognition Using Depth-Wise Separable Convolution-Based Models." Symmetry 13, no. 3 (March 21, 2021): 511. http://dx.doi.org/10.3390/sym13030511.

Full text

Abstract:

Proper plant leaf disease (PLD) detection is challenging in complex backgrounds and under different capture conditions. For this reason, initially, modified adaptive centroid-based segmentation (ACS) is used to trace the proper region of interest (ROI). Automatic initialization of the number of clusters (K) using modified ACS before recognition increases tracing ROI’s scalability even for symmetrical features in various plants. Besides, convolutional neural network (CNN)-based PLD recognition models achieve adequate accuracy to some extent. However, memory requirements (large-scaled parameters) and the high computational cost of CNN-based PLD models are burning issues for the memory restricted mobile and IoT-based devices. Therefore, after tracing ROIs, three proposed depth-wise separable convolutional PLD (DSCPLD) models, such as segmented modified DSCPLD (S-modified MobileNet), segmented reduced DSCPLD (S-reduced MobileNet), and segmented extended DSCPLD (S-extended MobileNet), are utilized to represent the constructive trade-off among accuracy, model size, and computational latency. Moreover, we have compared our proposed DSCPLD recognition models with state-of-the-art models, such as MobileNet, VGG16, VGG19, and AlexNet. Among segmented-based DSCPLD models, S-modified MobileNet achieves the best accuracy of 99.55% and F1-sore of 97.07%. Besides, we have simulated our DSCPLD models using both full plant leaf images and segmented plant leaf images and conclude that, after using modified ACS, all models increase their accuracy and F1-score. Furthermore, a new plant leaf dataset containing 6580 images of eight plants was used to experiment with several depth-wise separable convolution models.

APA, Harvard, Vancouver, ISO, and other styles

2

Kim, Daehee, Juhee Kang, and Jaekoo Lee. "Lightweighting of Super-Resolution Model Using Depth-Wise Separable Convolution." Journal of Korean Institute of Communications and Information Sciences 46, no. 4 (April 30, 2021): 591–97. http://dx.doi.org/10.7840/kics.2021.46.4.591.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Zhang, Ke, Ken Cheng, Jingjing Li, and Yuanyuan Peng. "A Channel Pruning Algorithm Based on Depth-Wise Separable Convolution Unit." IEEE Access 7 (2019): 173294–309. http://dx.doi.org/10.1109/access.2019.2956976.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Siddiqua, Shahzia, Naveena Chikkaguddaiah, Sunilkumar S. Manvi, and Manjunath Aradhya. "AksharaNet: A GPU Accelerated Modified Depth-Wise Separable Convolution for Kannada Text Classification." Revue d'Intelligence Artificielle 35, no. 2 (April 30, 2021): 145–52. http://dx.doi.org/10.18280/ria.350206.

Full text

Abstract:

For content-based indexing and retrieval applications, text characters embedded in images are a rich source of information. Owing to their different shapes, grayscale values, and dynamic backgrounds, these text characters in scene images are difficult to detect and classify. The complexity increases when the text involved is a vernacular language like Kannada. Despite advances in deep learning neural networks (DLNN), there is a dearth of fast and effective models to classify scene text images and the availability of a large-scale Kannada scene character dataset to train them. In this paper, two key contributions are proposed, AksharaNet, a graphical processing unit (GPU) accelerated modified convolution neural network architecture consisting of linearly inverted depth-wise separable convolutions and a Kannada Scene Individual Character (KSIC) dataset which is grounds-up curated consisting of 46,800 images. From results it is observed AksharaNet outperforms four other well-established models by 1.5% on CPU and 1.9% on GPU. The result can be directly attributed to the quality of the developed KSIC dataset. Early stopping decisions at 25% and 50% epoch with good and bad accuracies for complex and light models are discussed. Also, useful findings concerning learning rate drop factor and its ideal application period for application are enumerated.

APA, Harvard, Vancouver, ISO, and other styles

5

Chao, Xiaofei, Xiao Hu, Jingze Feng, Zhao Zhang, Meili Wang, and Dongjian He. "Construction of Apple Leaf Diseases Identification Networks Based on Xception Fused by SE Module." Applied Sciences 11, no. 10 (May 18, 2021): 4614. http://dx.doi.org/10.3390/app11104614.

Full text

Abstract:

The fast and accurate identification of apple leaf diseases is beneficial for disease control and management of apple orchards. An improved network for apple leaf disease classification and a lightweight model for mobile terminal usage was designed in this paper. First, we proposed SE-DEEP block to fuse the Squeeze-and-Excitation (SE) module with the Xception network to get the SE_Xception network, where the SE module is inserted between the depth-wise convolution and point-wise convolution of the depth-wise separable convolution layer. Therefore, the feature channels from the lower layers could be directly weighted, which made the model more sensitive to the principal features of the classification task. Second, we designed a lightweight network, named SE_miniXception, by reducing the depth and width of SE_Xception. Experimental results show that the average classification accuracy of SE_Xception is 99.40%, which is 1.99% higher than Xception. The average classification accuracy of SE_miniXception is 97.01%, which is 1.60% and 1.22% higher than MobileNetV1 and ShuffleNet, respectively, while its number of parameters is less than those of MobileNet and ShuffleNet. The minimized network decreases the memory usage and FLOPs, and accelerates the recognition speed from 15 to 7 milliseconds per image. Our proposed SE-DEEP block provides a choice for improving network accuracy and our network compression scheme provides ideas to lightweight existing networks.

APA, Harvard, Vancouver, ISO, and other styles

6

Kate, Vandana, and Pragya Shukla. "Breast Cancer Image Multi-Classification Using Random Patch Aggregation and Depth-Wise Convolution based Deep-Net Model." International Journal of Online and Biomedical Engineering (iJOE) 17, no. 01 (January 19, 2021): 83. http://dx.doi.org/10.3991/ijoe.v17i01.18513.

Full text

Abstract:

Adapting the profound, deep convolutional neural network models for large image classification can result in the layout of network architectures with a large number of learnable parameters and tuning of those varied parameters can considerably grow the complexity of the model. To address this problem a convolutional Deep-Net Model based on the extraction of random patches and enforcing depth-wise convolutions is proposed for training and classification of widely known benchmark Breast Cancer histopathology images. The classification result of these patches is aggregated using majority vote casting in deciding the final image classification type. It has been observed that the proposed Deep-Net model implementation results when compared with classification results of the VGG Net(16 layers) learned features, outclasses in terms of accuracy when applied to breast tumor Histopathology images. The objective of this work is to examine and comprehensively analyze the sub-class classification performance of the proposed model across all optical magnification frontiers.

APA, Harvard, Vancouver, ISO, and other styles

7

Dang, Lanxue, Peidong Pang, and Jay Lee. "Depth-Wise Separable Convolution Neural Network with Residual Connection for Hyperspectral Image Classification." Remote Sensing 12, no. 20 (October 17, 2020): 3408. http://dx.doi.org/10.3390/rs12203408.

Full text

Abstract:

The neural network-based hyperspectral images (HSI) classification model has a deep structure, which leads to the increase of training parameters, long training time, and excessive computational cost. The deepened network models are likely to cause the problem of gradient disappearance, which limits further improvement for its classification accuracy. To this end, a residual unit with fewer training parameters were constructed by combining the residual connection with the depth-wise separable convolution. With the increased depth of the network, the number of output channels of each residual unit increases linearly with a small amplitude. The deepened network can continuously extract the spectral and spatial features while building a cone network structure by stacking the residual units. At the end of executing the model, a 1 × 1 convolution layer combined with a global average pooling layer can be used to replace the traditional fully connected layer to complete the classification with reduced parameters needed in the network. Experiments were conducted on three benchmark HSI datasets: Indian Pines, Pavia University, and Kennedy Space Center. The overall classification accuracy was 98.85%, 99.58%, and 99.96% respectively. Compared with other classification methods, the proposed network model guarantees a higher classification accuracy while spending less time on training and testing sample sites.

APA, Harvard, Vancouver, ISO, and other styles

8

商, 丽娟. "Super-Resolution Reconstruction Algorithm for Cross-Module Based on Depth-Wise Separable Convolution." Journal of Image and Signal Processing 07, no. 02 (2018): 96–104. http://dx.doi.org/10.12677/jisp.2018.72011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Huang, Gangjin, Yuanliang Zhang, and Jiayu Ou. "Transfer remaining useful life estimation of bearing using depth-wise separable convolution recurrent network." Measurement 176 (May 2021): 109090. http://dx.doi.org/10.1016/j.measurement.2021.109090.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Cho, Sung In, Jae Hyeon Park, and Suk-Ju Kang. "A Generative Adversarial Network-Based Image Denoiser Controlling Heterogeneous Losses." Sensors 21, no. 4 (February 8, 2021): 1191. http://dx.doi.org/10.3390/s21041191.

Full text

Abstract:

We propose a novel generative adversarial network (GAN)-based image denoising method that utilizes heterogeneous losses. In order to improve the restoration quality of the structural information of the generator, the heterogeneous losses, including the structural loss in addition to the conventional mean squared error (MSE)-based loss, are used to train the generator. To maximize the improvements brought on by the heterogeneous losses, the strength of the structural loss is adaptively adjusted by the discriminator for each input patch. In addition, a depth wise separable convolution-based module that utilizes the dilated convolution and symmetric skip connection is used for the proposed GAN so as to reduce the computational complexity while providing improved denoising quality compared to the convolutional neural network (CNN) denoiser. The experiments showed that the proposed method improved visual information fidelity and feature similarity index values by up to 0.027 and 0.008, respectively, compared to the existing CNN denoiser.

APA, Harvard, Vancouver, ISO, and other styles

11

Vorugunti, Chandra Sekhar, Viswanath Pulabaigari, Rama Krishna Sai Subrahmanyam Gorthi, and Prerana Mukherjee. "OSVFuseNet: Online Signature Verification by feature fusion and depth-wise separable convolution based deep learning." Neurocomputing 409 (October 2020): 157–72. http://dx.doi.org/10.1016/j.neucom.2020.05.072.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Dankwa, Stephen, and Lu Yang. "An Efficient and Accurate Depth-Wise Separable Convolutional Neural Network for Cybersecurity Vulnerability Assessment Based on CAPTCHA Breaking." Electronics 10, no. 4 (February 18, 2021): 480. http://dx.doi.org/10.3390/electronics10040480.

Full text

Abstract:

Cybersecurity practitioners generate a Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHAs) as a form of security mechanism in website applications, in order to differentiate between human end-users and machine bots. They tend to use standard security to implement CAPTCHAs in order to prevent hackers from writing malicious automated programs to make false website registrations and to restrict them from stealing end-users’ private information. Among the categories of CAPTCHAs, the text-based CAPTCHA is the most widely used. However, with the evolution of deep learning, it has been so dramatic that tasks previously thought not easily addressable by computers and used as CAPTCHA to prevent spam are now possible to break. The workflow of CAPTCHA breaking is a combination of efforts, approaches, and the development of the computation-efficient Convolutional Neural Network (CNN) model that attempts to increase accuracy. In this study, in contrast to breaking the whole CAPTCHA images simultaneously, this study split four-character CAPTCHA images for the individual characters with a 2-pixel margin around the edges of a new training dataset, and then proposed an efficient and accurate Depth-wise Separable Convolutional Neural Network for breaking text-based CAPTCHAs. Most importantly, to the best of our knowledge, this is the first CAPTCHA breaking study to use the Depth-wise Separable Convolution layer to build an efficient CNN model to break text-based CAPTCHAs. We have evaluated and compared the performance of our proposed model to that of fine-tuning other popular CNN image recognition architectures on the generated CAPTCHA image dataset. In real-time, our proposed model used less time to break the text-based CAPTCHAs with an accuracy of more than 99% on the testing dataset. We observed that our proposed CNN model has efficiently improved the CAPTCHA breaking accuracy and streamlined the structure of the CAPTCHA breaking network as compared to other CAPTCHA breaking techniques.

APA, Harvard, Vancouver, ISO, and other styles

13

Bouguezzi, Safa, Hana Ben Fredj, Tarek Belabed, Carlos Valderrama, Hassene Faiedh, and Chokri Souani. "An Efficient FPGA-Based Convolutional Neural Network for Classification: Ad-MobileNet." Electronics 10, no. 18 (September 16, 2021): 2272. http://dx.doi.org/10.3390/electronics10182272.

Full text

Abstract:

Convolutional Neural Networks (CNN) continue to dominate research in the area of hardware acceleration using Field Programmable Gate Arrays (FPGA), proving its effectiveness in a variety of computer vision applications such as object segmentation, image classification, face detection, and traffic signs recognition, among others. However, there are numerous constraints for deploying CNNs on FPGA, including limited on-chip memory, CNN size, and configuration parameters. This paper introduces Ad-MobileNet, an advanced CNN model inspired by the baseline MobileNet model. The proposed model uses an Ad-depth engine, which is an improved version of the depth-wise separable convolution unit. Moreover, we propose an FPGA-based implementation model that supports the Mish, TanhExp, and ReLU activation functions. The experimental results using the CIFAR-10 dataset show that our Ad-MobileNet has a classification accuracy of 88.76% while requiring little computational hardware resources. Compared to state-of-the-art methods, our proposed method has a fairly high recognition rate while using fewer computational hardware resources. Indeed, the proposed model helps to reduce hardware resources by more than 41% compared to that of the baseline model.

APA, Harvard, Vancouver, ISO, and other styles

14

Zhang, Cheng, Wanshou Jiang, and Qing Zhao. "Semantic Segmentation of Aerial Imagery via Split-Attention Networks with Disentangled Nonlocal and Edge Supervision." Remote Sensing 13, no. 6 (March 19, 2021): 1176. http://dx.doi.org/10.3390/rs13061176.

Full text

Abstract:

In this work, we propose a new deep convolution neural network (DCNN) architecture for semantic segmentation of aerial imagery. Taking advantage of recent research, we use split-attention networks (ResNeSt) as the backbone for high-quality feature expression. Additionally, a disentangled nonlocal (DNL) block is integrated into our pipeline to express the inter-pixel long-distance dependence and highlight the edge pixels simultaneously. Moreover, the depth-wise separable convolution and atrous spatial pyramid pooling (ASPP) modules are combined to extract and fuse multiscale contextual features. Finally, an auxiliary edge detection task is designed to provide edge constraints for semantic segmentation. Evaluation of algorithms is conducted on two benchmarks provided by the International Society for Photogrammetry and Remote Sensing (ISPRS). Extensive experiments demonstrate the effectiveness of each module of our architecture. Precision evaluation based on the Potsdam benchmark shows that the proposed DCNN achieves competitive performance over the state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

15

Wang, Qi, Meihan Wu, Fei Yu, Chen Feng, Kaige Li, Yuemei Zhu, Eric Rigall, and Bo He. "RT-Seg: A Real-Time Semantic Segmentation Network for Side-Scan Sonar Images." Sensors 19, no. 9 (April 28, 2019): 1985. http://dx.doi.org/10.3390/s19091985.

Full text

Abstract:

Real-time processing of high-resolution sonar images is of great significance for the autonomy and intelligence of autonomous underwater vehicle (AUV) in complex marine environments. In this paper, we propose a real-time semantic segmentation network termed RT-Seg for Side-Scan Sonar (SSS) images. The proposed architecture is based on a novel encoder-decoder structure, in which the encoder blocks utilized Depth-Wise Separable Convolution and a 2-way branch for improving performance, and a corresponding decoder network is implemented to restore the details of the targets, followed by a pixel-wise classification layer. Moreover, we use patch-wise strategy for splitting the high-resolution image into local patches and applying them to network training. The well-trained model is used for testing high-resolution SSS images produced by sonar sensor in an onboard Graphic Processing Unit (GPU). The experimental results show that RT-Seg can greatly reduce the number of parameters and floating point operations compared to other networks. It runs at 25.67 frames per second on an NVIDIA Jetson AGX Xavier on 500*500 inputs with excellent segmentation result. Further insights on the speed and accuracy trade-off are discussed in this paper.

APA, Harvard, Vancouver, ISO, and other styles

16

Yuan, Q., Y. Ang, and H. Z. M. Shafri. "HYPERSPECTRAL IMAGE CLASSIFICATION USING RESIDUAL 2D AND 3D CONVOLUTIONAL NEURAL NETWORK JOINT ATTENTION MODEL." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIV-M-3-2021 (August 10, 2021): 187–93. http://dx.doi.org/10.5194/isprs-archives-xliv-m-3-2021-187-2021.

Full text

Abstract:

Abstract. Hyperspectral image classification (HSIC) is a challenging task in remote sensing data analysis, which has been applied in many domains for better identification and inspection of the earth surface by extracting spectral and spatial information. The combination of abundant spectral features and accurate spatial information can improve classification accuracy. However, many traditional methods are based on handcrafted features, which brings difficulties for multi-classification tasks due to spectral intra-class heterogeneity and similarity of inter-class. The deep learning algorithm, especially the convolutional neural network (CNN), has been perceived promising feature extractor and classification for processing hyperspectral remote sensing images. Although 2D CNN can extract spatial features, the specific spectral properties are not used effectively. While 3D CNN has the capability for them, but the computational burden increases as stacking layers. To address these issues, we propose a novel HSIC framework based on the residual CNN network by integrating the advantage of 2D and 3D CNN. First, 3D convolutions focus on extracting spectral features with feature recalibration and refinement by channel attention mechanism. The 2D depth-wise separable convolution approach with different size kernels concentrates on obtaining multi-scale spatial features and reducing model parameters. Furthermore, the residual structure optimizes the back-propagation for network training. The results and analysis of extensive HSIC experiments show that the proposed residual 2D-3D CNN network can effectively extract spectral and spatial features and improve classification accuracy.

APA, Harvard, Vancouver, ISO, and other styles

17

Ying, Boyu, Yuancheng Xu, Shuai Zhang, Yinggang Shi, and Li Liu. "Weed Detection in Images of Carrot Fields Based on Improved YOLO v4." Traitement du Signal 38, no. 2 (April 30, 2021): 341–48. http://dx.doi.org/10.18280/ts.380211.

Full text

Abstract:

The accurate weed detection is the premise for precision prevention and control of weeds in fields. Machine vision offers an effective means to detect weeds accurately. For precision detection of various weeds in carrot fields, this paper improves You Only Look Once v4 (YOLO v4) into a lightweight weed detection model called YOLO v4-weeds for the weeds among carrot seedlings. Specifically, the backbone network of the original YOLOv4 was replaced with MobileNetV3-Small. Combined with depth-wise separable convolution and inverted residual structure, a lightweight attention mechanism was introduced to reduce the memory required to process images, making the detection model more efficient. The research results provide a reference for the weed detection, robot weeding, and selective spraying.

APA, Harvard, Vancouver, ISO, and other styles

18

Wan, Haifeng, Lei Gao, Manman Su, Qinglong You, Hui Qu, and Qirun Sun. "A Novel Neural Network Model for Traffic Sign Detection and Recognition under Extreme Conditions." Journal of Sensors 2021 (July 9, 2021): 1–16. http://dx.doi.org/10.1155/2021/9984787.

Full text

Abstract:

Traffic sign detection is extremely important in autonomous driving and transportation safety systems. However, the accurate detection of traffic signs remains challenging, especially under extreme conditions. This paper proposes a novel model called Traffic Sign Yolo (TS-Yolo) based on the convolutional neural network to improve the detection and recognition accuracy of traffic signs, especially under low visibility and extremely restricted vision conditions. A copy-and-paste data augmentation method was used to build a large number of new samples based on existing traffic-sign datasets. Based on You Only Look Once (YoloV5), the mixed depth-wise convolution (MixConv) was employed to mix different kernel sizes in a single convolution operation, so that different patterns with various resolutions can be captured. Furthermore, the attentional feature fusion (AFF) module was integrated to fuse the features based on attention from same-layer to cross-layer scenarios, including short and long skip connections, and even performing the initial fusion with itself. The experimental results demonstrated that, using the YoloV5 dataset with augmentation, the precision was 71.92, which was increased by 34.56 compared with the data without augmentation, and the mean average precision mAP_0.5 was 80.05, which was increased by 33.11 compared with the data without augmentation. When MixConv and AFF were applied to the TS-Yolo model, the precision was 74.53 and 2.61 higher than that with data augmentation only, and the value of mAP_0.5 was 83.73 and 3.68 higher than that based on the YoloV5 dataset with augmentation only. Overall, the performance of the proposed method was competitive with the latest traffic sign detection approaches.

APA, Harvard, Vancouver, ISO, and other styles

19

Kim, Sangwon, Jaeyeal Nam, and Byoungchul Ko. "Fast Depth Estimation in a Single Image Using Lightweight Efficient Neural Network." Sensors 19, no. 20 (October 13, 2019): 4434. http://dx.doi.org/10.3390/s19204434.

Full text

Abstract:

Depth estimation is a crucial and fundamental problem in the computer vision field. Conventional methods re-construct scenes using feature points extracted from multiple images; however, these approaches require multiple images and thus are not easily implemented in various real-time applications. Moreover, the special equipment required by hardware-based approaches using 3D sensors is expensive. Therefore, software-based methods for estimating depth from a single image using machine learning or deep learning are emerging as new alternatives. In this paper, we propose an algorithm that generates a depth map in real time using a single image and an optimized lightweight efficient neural network (L-ENet) algorithm instead of physical equipment, such as an infrared sensor or multi-view camera. Because depth values have a continuous nature and can produce locally ambiguous results, pixel-wise prediction with ordinal depth range classification was applied in this study. In addition, in our method various convolution techniques are applied to extract a dense feature map, and the number of parameters is greatly reduced by reducing the network layer. By using the proposed L-ENet algorithm, an accurate depth map can be generated from a single image quickly and, in a comparison with the ground truth, we can produce depth values closer to those of the ground truth with small errors. Experiments confirmed that the proposed L-ENet can achieve a significantly improved estimation performance over the state-of-the-art algorithms in depth estimation based on a single image.

APA, Harvard, Vancouver, ISO, and other styles

20

Reghukumar, Arathi, L. Jani Anbarasi, J. Prassanna, R. Manikandan, and Fadi Al-Turjman. "Vision Based Segmentation and Classification of Cracks Using Deep Neural Networks." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 29, Supp01 (March 26, 2021): 141–56. http://dx.doi.org/10.1142/s0218488521400080.

Full text

Abstract:

Deep learning artificial intelligence (AI) is a booming area in the research field. It allows the development of end-to-end models to predict outcomes based on input data without the need for manual extraction of features. This paper aims for evaluating the automatic crack detection process that is used in identifying the cracks in building structures such as bridges, foundations or other large structures using images. A hybrid approach involving image processing and deep learning algorithms is proposed to detect automatic cracks in structures. As cracks are detected in the images they are segmented using a segmentation process. The proposed deep learning models include a hybrid architecture combining Mask R-CNN with single layer CNN, 3-layer CNN, and8-layer CNN. These models utilizes depth wise convolution with varying dilation rates for efficiently extracting diversified features from the crack images. Further, performance evaluation shows that Mask R-CNN with a single layer CNN achieves an accuracy of 97.5% on a normal dataset and 97.8% on a segmented dataset. The Mask R-CNN with 2-layer convolution resulted in an accuracy of 98.32% on a normal dataset and 98.39% on a segmented dataset. The Mask R-CNN with 8-layers convolution achieves an accuracy of 98.4% on a normal dataset and 98.75% on a segmented dataset. The proposed Mask R-CNN have proved its feasibility in detecting cracks in huge building and structures.

APA, Harvard, Vancouver, ISO, and other styles

21

Zhao, Xu, Xiaoqing Liang, Chaoyang Zhao, Ming Tang, and Jinqiao Wang. "Real-Time Multi-Scale Face Detector on Embedded Devices." Sensors 19, no. 9 (May 9, 2019): 2158. http://dx.doi.org/10.3390/s19092158.

Full text

Abstract:

Face detection is the basic step in video face analysis and has been studied for many years. However, achieving real-time performance on computation-resource-limited embedded devices still remains an open challenge. To address this problem, in this paper we propose a face detector, EagleEye, which shows a good trade-off between high accuracy and fast speed on the popular embedded device with low computation power (e.g., the Raspberry Pi 3b+). The EagleEye is designed to have low floating-point operations per second (FLOPS) as well as enough capacity, and its accuracy is further improved without adding too much FLOPS. Specifically, we design five strategies for building efficient face detectors with a good balance of accuracy and running speed. The first two strategies help to build a detector with low computation complexity and enough capacity. We use convolution factorization to change traditional convolutions into more sparse depth-wise convolutions to save computation costs and we use successive downsampling convolutions at the beginning of the face detection network. The latter three strategies significantly improve the accuracy of the light-weight detector without adding too much computation costs. We design an efficient context module to utilize context information to benefit the face detection. We also adopt information preserving activation function to increase the network capacity. Finally, we use focal loss to further improve the accuracy by handling the class imbalance problem better. Experiments show that the EagleEye outperforms the other face detectors with the same order of computation costs, on both runtime efficiency and accuracy.

APA, Harvard, Vancouver, ISO, and other styles

22

Tian, Sirui, Yiyu Lin, Wenyun Gao, Hong Zhang, and Chao Wang. "A Multi-Scale U-Shaped Convolution Auto-Encoder Based on Pyramid Pooling Module for Object Recognition in Synthetic Aperture Radar Images." Sensors 20, no. 5 (March 10, 2020): 1533. http://dx.doi.org/10.3390/s20051533.

Full text

Abstract:

Although unsupervised representation learning (RL) can tackle the performance deterioration caused by limited labeled data in synthetic aperture radar (SAR) object classification, the neglected discriminative detailed information and the ignored distinctive characteristics of SAR images can lead to performance degradation. In this paper, an unsupervised multi-scale convolution auto-encoder (MSCAE) was proposed which can simultaneously obtain the global features and local characteristics of targets with its U-shaped architecture and pyramid pooling modules (PPMs). The compact depth-wise separable convolution and the deconvolution counterpart were devised to decrease the trainable parameters. The PPM and the multi-scale feature learning scheme were designed to learn multi-scale features. Prior knowledge of SAR speckle was also embedded in the model. The reconstruction loss of the MSCAE was measured by the structural similarity index metric (SSIM) of the reconstructed data and the images filtered by the improved Lee sigma filter. A speckle suppression restriction was also added in the objective function to guarantee that the speckle suppression procedure would take place in the feature learning stage. Experimental results with the MSTAR dataset under the standard operating condition and several extended operating conditions demonstrated the effectiveness of the proposed model in SAR object classification tasks.

APA, Harvard, Vancouver, ISO, and other styles

23

Cimurs, Reinis, Jin Han Lee, and Il Hong Suh. "Goal-Oriented Obstacle Avoidance with Deep Reinforcement Learning in Continuous Action Space." Electronics 9, no. 3 (February 28, 2020): 411. http://dx.doi.org/10.3390/electronics9030411.

Full text

Abstract:

In this paper, we propose a goal-oriented obstacle avoidance navigation system based on deep reinforcement learning that uses depth information in scenes, as well as goal position in polar coordinates as state inputs. The control signals for robot motion are output in a continuous action space. We devise a deep deterministic policy gradient network with the inclusion of depth-wise separable convolution layers to process the large amounts of sequential depth image information. The goal-oriented obstacle avoidance navigation is performed without prior knowledge of the environment or a map. We show that through the proposed deep reinforcement learning network, a goal-oriented collision avoidance model can be trained end-to-end without manual tuning or supervision by a human operator. We train our model in a simulation, and the resulting network is directly transferred to other environments. Experiments show the capability of the trained network to navigate safely around obstacles and arrive at the designated goal positions in the simulation, as well as in the real world. The proposed method exhibits higher reliability than the compared approaches when navigating around obstacles with complex shapes. The experiments show that the approach is capable of avoiding not only static, but also dynamic obstacles.

APA, Harvard, Vancouver, ISO, and other styles

24

Safarov, Sirojbek, and Taeg Keun Whangbo. "A-DenseUNet: Adaptive Densely Connected UNet for Polyp Segmentation in Colonoscopy Images with Atrous Convolution." Sensors 21, no. 4 (February 19, 2021): 1441. http://dx.doi.org/10.3390/s21041441.

Full text

Abstract:

Colon carcinoma is one of the leading causes of cancer-related death in both men and women. Automatic colorectal polyp segmentation and detection in colonoscopy videos help endoscopists to identify colorectal disease more easily, making it a promising method to prevent colon cancer. In this study, we developed a fully automated pixel-wise polyp segmentation model named A-DenseUNet. The proposed architecture adapts different datasets, adjusting for the unknown depth of the network by sharing multiscale encoding information to the different levels of the decoder side. We also used multiple dilated convolutions with various atrous rates to observe a large field of view without increasing the computational cost and prevent loss of spatial information, which would cause dimensionality reduction. We utilized an attention mechanism to remove noise and inappropriate information, leading to the comprehensive re-establishment of contextual features. Our experiments demonstrated that the proposed architecture achieved significant segmentation results on public datasets. A-DenseUNet achieved a 90% Dice coefficient score on the Kvasir-SEG dataset and a 91% Dice coefficient score on the CVC-612 dataset, both of which were higher than the scores of other deep learning models such as UNet++, ResUNet, U-Net, PraNet, and ResUNet++ for segmenting polyps in colonoscopy images.

APA, Harvard, Vancouver, ISO, and other styles

25

Tao, Zhen, Shiwei Ren, Yueting Shi, Xiaohua Wang, and Weijiang Wang. "Accurate and Lightweight RailNet for Real-Time Rail Line Detection." Electronics 10, no. 16 (August 23, 2021): 2038. http://dx.doi.org/10.3390/electronics10162038.

Full text

Abstract:

Railway transportation has always occupied an important position in daily life and social progress. In recent years, computer vision has made promising breakthroughs in intelligent transportation, providing new ideas for detecting rail lines. Yet the majority of rail line detection algorithms use traditional image processing to extract features, and their detection accuracy and instantaneity remain to be improved. This paper goes beyond the aforementioned limitations and proposes a rail line detection algorithm based on deep learning. First, an accurate and lightweight RailNet is designed, which takes full advantage of the powerful advanced semantic information extraction capabilities of deep convolutional neural networks to obtain high-level features of rail lines. The Segmentation Soul (SS) module is creatively added to the RailNet structure, which improves segmentation performance without any additional inference time. The Depth Wise Convolution (DWconv) is introduced in the RailNet to reduce the number of network parameters and eventually ensure real-time detection. Afterward, according to the binary segmentation maps of RailNet output, we propose the rail line fitting algorithm based on sliding window detection and apply the inverse perspective transformation. Thus the polynomial functions and curvature of the rail lines are calculated, and rail lines are identified in the original images. Furthermore, we collect a real-world rail lines dataset, named RAWRail. The proposed algorithm has been fully validated on the RAWRail dataset, running at 74 FPS, and the accuracy reaches 98.6%, which is superior to the current rail line detection algorithms and shows powerful potential in real applications.

APA, Harvard, Vancouver, ISO, and other styles

26

Wu, You, Xiaodong Zhang, and Fengzhou Fang. "Automatic Fabric Defect Detection Using Cascaded Mixed Feature Pyramid with Guided Localization." Sensors 20, no. 3 (February 6, 2020): 871. http://dx.doi.org/10.3390/s20030871.

Full text

Abstract:

Generic object detection algorithms for natural images have been proven to have excellent performance. In this paper, fabric defect detection on optical image datasets is systematically studied. In contrast to generic datasets, defect images are multi-scale, noise-filled, and blurred. Back-light intensity would also be sensitive for visual perception. Large-scale fabric defect datasets are collected, selected, and employed to fulfill the requirements of detection in industrial practice in order to address these imbalanced issues. An improved two-stage defect detector is constructed for achieving better generalization. Stacked feature pyramid networks are set up to aggregate cross-scale defect patterns on interpolating mixed depth-wise block in stage one. By sharing feature maps, center-ness and shape branches merges cascaded modules with deformable convolution to filter and refine the proposed guided anchors. After balanced sampling, the proposals are down-sampled by position-sensitive pooling for region of interest, in order to characterize interactions among fabric defect images in stage two. The experiments show that the end-to-end architecture improves the occluded defect performance of region-based object detectors as compared with the current detectors.

APA, Harvard, Vancouver, ISO, and other styles

27

Bittner, K., S. Cui, and P. Reinartz. "BUILDING EXTRACTION FROM REMOTE SENSING DATA USING FULLY CONVOLUTIONAL NETWORKS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-1/W1 (May 31, 2017): 481–86. http://dx.doi.org/10.5194/isprs-archives-xlii-1-w1-481-2017.

Full text

Abstract:

Building detection and footprint extraction are highly demanded for many remote sensing applications. Though most previous works have shown promising results, the automatic extraction of building footprints still remains a nontrivial topic, especially in complex urban areas. Recently developed extensions of the CNN framework made it possible to perform dense pixel-wise classification of input images. Based on these abilities we propose a methodology, which automatically generates a full resolution binary building mask out of a <i>Digital Surface Model (DSM)</i> using a Fully Convolution Network (FCN) architecture. The advantage of using the depth information is that it provides geometrical silhouettes and allows a better separation of buildings from background as well as through its invariance to illumination and color variations. The proposed framework has mainly two steps. Firstly, the FCN is trained on a large set of patches consisting of normalized DSM (nDSM) as inputs and available ground truth building mask as target outputs. Secondly, the generated predictions from FCN are viewed as unary terms for a Fully connected Conditional Random Fields (FCRF), which enables us to create a final binary building mask. A series of experiments demonstrate that our methodology is able to extract accurate building footprints which are close to the buildings original shapes to a high degree. The quantitative and qualitative analysis show the significant improvements of the results in contrast to the multy-layer fully connected network from our previous work.

APA, Harvard, Vancouver, ISO, and other styles

28

Ullah, Shan, and Deok-Hwan Kim. "Lightweight Driver Behavior Identification Model with Sparse Learning on In-Vehicle CAN-BUS Sensor Data." Sensors 20, no. 18 (September 4, 2020): 5030. http://dx.doi.org/10.3390/s20185030.

Full text

Abstract:

This study focuses on driver-behavior identification and its application to finding embedded solutions in a connected car environment. We present a lightweight, end-to-end deep-learning framework for performing driver-behavior identification using in-vehicle controller area network (CAN-BUS) sensor data. The proposed method outperforms the state-of-the-art driver-behavior profiling models. Particularly, it exhibits significantly reduced computations (i.e., reduced numbers both of floating-point operations and parameters), more efficient memory usage (compact model size), and less inference time. The proposed architecture features depth-wise convolution, along with augmented recurrent neural networks (long short-term memory or gated recurrent unit), for time-series classification. The minimum time-step length (window size) required in the proposed method is significantly lower than that required by recent algorithms. We compared our results with compressed versions of existing models by applying efficient channel pruning on several layers of current models. Furthermore, our network can adapt to new classes using sparse-learning techniques, that is, by freezing relatively strong nodes at the fully connected layer for the existing classes and improving the weaker nodes by retraining them using data regarding the new classes. We successfully deploy the proposed method in a container environment using NVIDIA Docker in an embedded system (Xavier, TX2, and Nano) and comprehensively evaluate it with regard to numerous performance metrics.

APA, Harvard, Vancouver, ISO, and other styles

29

Gao, Fei, Yishan He, Jun Wang, Amir Hussain, and Huiyu Zhou. "Anchor-free Convolutional Network with Dense Attention Feature Aggregation for Ship Detection in SAR Images." Remote Sensing 12, no. 16 (August 13, 2020): 2619. http://dx.doi.org/10.3390/rs12162619.

Full text

Abstract:

In recent years, with the improvement of synthetic aperture radar (SAR) imaging resolution, it is urgent to develop methods with higher accuracy and faster speed for ship detection in high-resolution SAR images. Among all kinds of methods, deep-learning-based algorithms bring promising performance due to end-to-end detection and automated feature extraction. However, several challenges still exist: (1) standard deep learning detectors based on anchors have certain unsolved problems, such as tuning of anchor-related parameters, scale-variation and high computational costs. (2) SAR data is huge but the labeled data is relatively small, which may lead to overfitting in training. (3) To improve detection speed, deep learning detectors generally detect targets based on low-resolution features, which may cause missed detections for small targets. In order to address the above problems, an anchor-free convolutional network with dense attention feature aggregation is proposed in this paper. Firstly, we use a lightweight feature extractor to extract multiscale ship features. The inverted residual blocks with depth-wise separable convolution reduce the network parameters and improve the detection speed. Secondly, a novel feature aggregation scheme called dense attention feature aggregation (DAFA) is proposed to obtain a high-resolution feature map with multiscale information. By combining the multiscale features through dense connections and iterative fusions, DAFA improves the generalization performance of the network. In addition, an attention block, namely spatial and channel squeeze and excitation (SCSE) block is embedded in the upsampling process of DAFA to enhance the salient features of the target and suppress the background clutters. Third, an anchor-free detector, which is a center-point-based ship predictor (CSP), is adopted in this paper. CSP regresses the ship centers and ship sizes simultaneously on the high-resolution feature map to implement anchor-free and nonmaximum suppression (NMS)-free ship detection. The experiments on the AirSARShip-1.0 dataset demonstrate the effectiveness of our method. The results show that the proposed method outperforms several mainstream detection algorithms in both accuracy and speed.

APA, Harvard, Vancouver, ISO, and other styles

30

Lawrence, Tom, and Li Zhang. "IoTNet: An Efficient and Accurate Convolutional Neural Network for IoT Devices." Sensors 19, no. 24 (December 14, 2019): 5541. http://dx.doi.org/10.3390/s19245541.

Full text

Abstract:

Two main approaches exist when deploying a Convolutional Neural Network (CNN) on resource-constrained IoT devices: either scale a large model down or use a small model designed specifically for resource-constrained environments. Small architectures typically trade accuracy for computational cost by performing convolutions as depth-wise convolutions rather than standard convolutions like in large networks. Large models focus primarily on state-of-the-art performance and often struggle to scale down sufficiently. We propose a new model, namely IoTNet, designed for resource-constrained environments which achieves state-of-the-art performance within the domain of small efficient models. IoTNet trades accuracy with computational cost differently from existing methods by factorizing standard 3 × 3 convolutions into pairs of 1 × 3 and 3 × 1 standard convolutions, rather than performing depth-wise convolutions. We benchmark IoTNet against state-of-the-art efficiency-focused models and scaled-down large architectures on data sets which best match the complexity of problems faced in resource-constrained environments. We compare model accuracy and the number of floating-point operations (FLOPs) performed as a measure of efficiency. We report state-of-the-art accuracy improvement over MobileNetV2 on CIFAR-10 of 13.43% with 39% fewer FLOPs, over ShuffleNet on Street View House Numbers (SVHN) of 6.49% with 31.8% fewer FLOPs and over MobileNet on German Traffic Sign Recognition Benchmark (GTSRB) of 5% with 0.38% fewer FLOPs.

APA, Harvard, Vancouver, ISO, and other styles

31

Walia, Inderpreet, Muskan Srivastava, Deepika Kumar, Mehar Rani, Parth Muthreja, and Gaurav Mohadikar. "Pneumonia Detection using Depth-Wise Convolutional Neural Network (DW-CNN)." EAI Endorsed Transactions on Pervasive Health and Technology 6, no. 23 (September 22, 2020): 166290. http://dx.doi.org/10.4108/eai.28-5-2020.166290.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Hu, Ruiqi, Shirui Pan, Guodong Long, Qinghua Lu, Liming Zhu, and Jing Jiang. "Going Deep: Graph Convolutional Ladder-Shape Networks." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 03 (April 3, 2020): 2838–45. http://dx.doi.org/10.1609/aaai.v34i03.5673.

Full text

Abstract:

Neighborhood aggregation algorithms like spectral graph convolutional networks (GCNs) formulate graph convolutions as a symmetric Laplacian smoothing operation to aggregate the feature information of one node with that of its neighbors. While they have achieved great success in semi-supervised node classification on graphs, current approaches suffer from the over-smoothing problem when the depth of the neural networks increases, which always leads to a noticeable degradation of performance. To solve this problem, we present graph convolutional ladder-shape networks (GCLN), a novel graph neural network architecture that transmits messages from shallow layers to deeper layers to overcome the over-smoothing problem and dramatically extend the scale of the neural networks with improved performance. We have validated the effectiveness of proposed GCLN at a node-wise level with a semi-supervised task (node classification) and an unsupervised task (node clustering), and at a graph-wise level with graph classification by applying a differentiable pooling operation. The proposed GCLN outperforms original GCNs, deep GCNs and other state-of-the-art GCN-based models for all three tasks, which were designed from various perspectives on six real-world benchmark data sets.

APA, Harvard, Vancouver, ISO, and other styles

33

Zhang, Ru, Feng Zhu, Jianyi Liu, and Gongshen Liu. "Depth-Wise Separable Convolutions and Multi-Level Pooling for an Efficient Spatial CNN-Based Steganalysis." IEEE Transactions on Information Forensics and Security 15 (2020): 1138–50. http://dx.doi.org/10.1109/tifs.2019.2936913.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Fu, Kui, Peipei Shi, Yafei Song, Shiming Ge, Xiangju Lu, and Jia Li. "Ultrafast Video Attention Prediction with Coupled Knowledge Distillation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 10802–9. http://dx.doi.org/10.1609/aaai.v34i07.6710.

Full text

Abstract:

Large convolutional neural network models have recently demonstrated impressive performance on video attention prediction. Conventionally, these models are with intensive computation and large memory. To address these issues, we design an extremely light-weight network with ultrafast speed, named UVA-Net. The network is constructed based on depth-wise convolutions and takes low-resolution images as input. However, this straight-forward acceleration method will decrease performance dramatically. To this end, we propose a coupled knowledge distillation strategy to augment and train the network effectively. With this strategy, the model can further automatically discover and emphasize implicit useful cues contained in the data. Both spatial and temporal knowledge learned by the high-resolution complex teacher networks also can be distilled and transferred into the proposed low-resolution light-weight spatiotemporal network. Experimental results show that the performance of our model is comparable to 11 state-of-the-art models in video attention prediction, while it costs only 0.68 MB memory footprint, runs about 10,106 FPS on GPU and 404 FPS on CPU, which is 206 times faster than previous models.

APA, Harvard, Vancouver, ISO, and other styles

35

Feng, Fan, Shuangting Wang, Chunyang Wang, and Jin Zhang. "Learning Deep Hierarchical Spatial–Spectral Features for Hyperspectral Image Classification Based on Residual 3D-2D CNN." Sensors 19, no. 23 (November 29, 2019): 5276. http://dx.doi.org/10.3390/s19235276.

Full text

Abstract:

Every pixel in a hyperspectral image contains detailed spectral information in hundreds of narrow bands captured by hyperspectral sensors. Pixel-wise classification of a hyperspectral image is the cornerstone of various hyperspectral applications. Nowadays, deep learning models represented by the convolutional neural network (CNN) provides an ideal solution for feature extraction, and has made remarkable achievements in supervised hyperspectral classification. However, hyperspectral image annotation is time-consuming and laborious, and available training data is usually limited. Due to the “small-sample problem”, CNN-based hyperspectral classification is still challenging. Focused on the limited sample-based hyperspectral classification, we designed an 11-layer CNN model called R-HybridSN (Residual-HybridSN) from the perspective of network optimization. With an organic combination of 3D-2D-CNN, residual learning, and depth-separable convolutions, R-HybridSN can better learn deep hierarchical spatial–spectral features with very few training data. The performance of R-HybridSN is evaluated over three public available hyperspectral datasets on different amounts of training samples. Using only 5%, 1%, and 1% labeled data for training in Indian Pines, Salinas, and University of Pavia, respectively, the classification accuracy of R-HybridSN is 96.46%, 98.25%, 96.59%, respectively, which is far better than the contrast models.

APA, Harvard, Vancouver, ISO, and other styles

36

Yin, Ming. "Efficient Monocular Depth Estimation with Transfer Feature Enhancement." International Journal of Circuits, Systems and Signal Processing 15 (August 27, 2021): 1165–73. http://dx.doi.org/10.46300/9106.2021.15.127.

Full text

Abstract:

Estimating the depth of the scene from a monocular image is an essential step for image semantic understanding. Practically, some existing methods for this highly ill-posed issue are still in lack of robustness and efficiency. This paper proposes a novel end-to-end depth esti- mation model with skip connections from a pre- trained Xception model for dense feature extrac- tion, and three new modules are designed to im- prove the upsampling process. In addition, ELU activation and convolutions with smaller kernel size are added to improve the pixel-wise regres- sion process. The experimental results show that our model has fewer network parameters, a lower error rate than the most advanced networks and requires only half the training time. The evalu- ation is based on the NYU v2 dataset, and our proposed model can achieve clearer boundary de- tails with state-of-the-art effects and robustness.

APA, Harvard, Vancouver, ISO, and other styles

37

Liu, Xiangyu. "Validation Research on the Application of Depth-wise Separable Convolutional AI Facial Expression Recognition in Non-pharmacological Treatment of BPSD." Journal of Clinical and Nursing Research 5, no. 4 (August 2, 2021): 31–37. http://dx.doi.org/10.26689/jcnr.v5i4.2325.

Full text

Abstract:

One of the most obvious clinical reasons of dementia or The Behavioral and Psychological Symptoms of Dementia (BPSD) are the lack of emotional expression, the increased frequency of negative emotions, and the impermanence of emotions. Observing the reduction of BPSD in dementia through emotions can be considered effective and widely used in the field of non-pharmacological therapy. At present, this article will verify whether the image recognition artificial intelligence (AI) system can correctly reflect the emotional performance of the elderly with dementia through a questionnaire survey of three professional elderly nursing staff. The ANOVA (sig.=0.50) is used to determine that the judgment given by the nursing staff has no obvious deviation, and then Kendall’s test (0.722**) and spearman’s test (0.863**) are used to verify the judgment severity of the emotion recognition system and the nursing staff unanimously. This implies the usability of the tool. Additionally, it can be expected to be further applied in the research related to BPSD elderly emotion detection.

APA, Harvard, Vancouver, ISO, and other styles

38

Khoshboresh-Masouleh, Mehdi, and Reza Shah-Hosseini. "A Deep Multi-Modal Learning Method and a New RGB-Depth Data Set for Building Roof Extraction." Photogrammetric Engineering & Remote Sensing 87, no. 10 (October 1, 2021): 759–66. http://dx.doi.org/10.14358/pers.21-00007r2.

Full text

Abstract:

This study focuses on tackling the challenge of building mapping in multi-modal remote sensing data by proposing a novel, deep superpixel-wise convolutional neural network called DeepQuantized-Net, plus a new red, green, blue (RGB)-depth data set named IND. DeepQuantized-Net incorporated two practical ideas in segmentation: first, improving the object pattern with the exploitation of superpixels instead of pixels, as the imaging unit in DeepQuantized-Net. Second, the reduction of computational cost. The generated data set includes 294 RGB-depth images (256 training images and 38 test images) from different locations in the state of Indiana in the U.S., with 1024 × 1024 pixels and a spatial resolution of 0.5 ftthat covers different cities. The experimental results using the IND data set demonstrates the mean F1 scores and the average Intersection over Union scores could increase by approximately 7.0% and 7.2% compared to other methods, respectively.

APA, Harvard, Vancouver, ISO, and other styles

39

Schmitz, M., H. Huang, and H. Mayer. "COMPARISON OF TRAINING STRATEGIES FOR CONVNETS ON MULTIPLE SIMILAR DATASETS FOR FACADE SEGMENTATION." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W13 (June 4, 2019): 111–17. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w13-111-2019.

Full text

Abstract:

<p><strong>Abstract.</strong> In this paper, we analyze different training strategies and accompanying architectures for Convolutional Networks (ConvNets) when multiple similar datasets are available using the semantic segmentation of rectified facade images as example. Additionally to direct training on the target dataset we analyze multi-task learning and fine-tuning. When using multi-task learning to train a ConvNet, multiple objectives are optimized in parallel. Fine-tuning optimizes these objectives sequentially. For both strategies, the tasks share a common part of the ConvNet for which we vary the depth. We present results for all strategies and compare them regarding the overall pixel-wise accuracy and show that for the special case of facade segmentation there are no significant differences using multiple datasets or not or training a ConvNet with different strategies.</p>

APA, Harvard, Vancouver, ISO, and other styles

40

Guo, Xiaopeng, Rencan Nie, Jinde Cao, Dongming Zhou, and Wenhua Qian. "Fully Convolutional Network-Based Multifocus Image Fusion." Neural Computation 30, no. 7 (July 2018): 1775–800. http://dx.doi.org/10.1162/neco_a_01098.

Full text

Abstract:

As the optical lenses for cameras always have limited depth of field, the captured images with the same scene are not all in focus. Multifocus image fusion is an efficient technology that can synthesize an all-in-focus image using several partially focused images. Previous methods have accomplished the fusion task in spatial or transform domains. However, fusion rules are always a problem in most methods. In this letter, from the aspect of focus region detection, we propose a novel multifocus image fusion method based on a fully convolutional network (FCN) learned from synthesized multifocus images. The primary novelty of this method is that the pixel-wise focus regions are detected through a learning FCN, and the entire image, not just the image patches, are exploited to train the FCN. First, we synthesize 4500 pairs of multifocus images by repeatedly using a gaussian filter for each image from PASCAL VOC 2012, to train the FCN. After that, a pair of source images is fed into the trained FCN, and two score maps indicating the focus property are generated. Next, an inversed score map is averaged with another score map to produce an aggregative score map, which take full advantage of focus probabilities in two score maps. We implement the fully connected conditional random field (CRF) on the aggregative score map to accomplish and refine a binary decision map for the fusion task. Finally, we exploit the weighted strategy based on the refined decision map to produce the fused image. To demonstrate the performance of the proposed method, we compare its fused results with several start-of-the-art methods not only on a gray data set but also on a color data set. Experimental results show that the proposed method can achieve superior fusion performance in both human visual quality and objective assessment.

APA, Harvard, Vancouver, ISO, and other styles

41

Madhuanand, L., F. Nex, and M. Y. Yang. "DEEP LEARNING FOR MONOCULAR DEPTH ESTIMATION FROM UAV IMAGES." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences V-2-2020 (August 3, 2020): 451–58. http://dx.doi.org/10.5194/isprs-annals-v-2-2020-451-2020.

Full text

Abstract:

Abstract. Depth is an essential component for various scene understanding tasks and for reconstructing the 3D geometry of the scene. Estimating depth from stereo images requires multiple views of the same scene to be captured which is often not possible when exploring new environments with a UAV. To overcome this monocular depth estimation has been a topic of interest with the recent advancements in computer vision and deep learning techniques. This research has been widely focused on indoor scenes or outdoor scenes captured at ground level. Single image depth estimation from aerial images has been limited due to additional complexities arising from increased camera distance, wider area coverage with lots of occlusions. A new aerial image dataset is prepared specifically for this purpose combining Unmanned Aerial Vehicles (UAV) images covering different regions, features and point of views. The single image depth estimation is based on image reconstruction techniques which uses stereo images for learning to estimate depth from single images. Among the various available models for ground-level single image depth estimation, two models, 1) a Convolutional Neural Network (CNN) and 2) a Generative Adversarial model (GAN) are used to learn depth from aerial images from UAVs. These models generate pixel-wise disparity images which could be converted into depth information. The generated disparity maps from these models are evaluated for its internal quality using various error metrics. The results show higher disparity ranges with smoother images generated by CNN model and sharper images with lesser disparity range generated by GAN model. The produced disparity images are converted to depth information and compared with point clouds obtained using Pix4D. It is found that the CNN model performs better than GAN and produces depth similar to that of Pix4D. This comparison helps in streamlining the efforts to produce depth from a single aerial image.

APA, Harvard, Vancouver, ISO, and other styles

42

Rogge, Ségolène, Ionut Schiopu, and Adrian Munteanu. "Depth Estimation for Light-Field Images Using Stereo Matching and Convolutional Neural Networks." Sensors 20, no. 21 (October 30, 2020): 6188. http://dx.doi.org/10.3390/s20216188.

Full text

Abstract:

The paper presents a novel depth-estimation method for light-field (LF) images based on innovative multi-stereo matching and machine-learning techniques. In the first stage, a novel block-based stereo matching algorithm is employed to compute the initial estimation. The proposed algorithm is specifically designed to operate on any pair of sub-aperture images (SAIs) in the LF image and to compute the pair’s corresponding disparity map. For the central SAI, a disparity fusion technique is proposed to compute the initial disparity map based on all available pairwise disparities. In the second stage, a novel pixel-wise deep-learning (DL)-based method for residual error prediction is employed to further refine the disparity estimation. A novel neural network architecture is proposed based on a new structure of layers. The proposed DL-based method is employed to predict the residual error of the initial estimation and to refine the final disparity map. The experimental results demonstrate the superiority of the proposed framework and reveal that the proposed method achieves an average improvement of 15.65% in root mean squared error (RMSE), 43.62% in mean absolute error (MAE), and 5.03% in structural similarity index (SSIM) over machine-learning-based state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

43

Liang, Liming, Zhimin Lan, Wen Xiong, and Xiaoqi Sheng. "Retinal Vessel Segmentation Based on W-Net Conditional Generative Adversarial Nets." Journal of Medical Imaging and Health Informatics 11, no. 7 (July 1, 2021): 2016–24. http://dx.doi.org/10.1166/jmihi.2021.3633.

Full text

Abstract:

Accurate extraction of retinal vessels is one important factor to computer-aided diagnosis for ophthalmologic diseases. Due to the low sensitivity and insufficient segmentation of tiny blood vessels within the existing segmentation algorithms, a novel retinal vessel segmentation algorithm is proposed, and its basis is on conditional generative adversarial nets, using W-net as generator. More specifically, firstly, the U-net is expanded to W-net through the skip connection, as the U-net is beneficial to the microvascular information transmission in the skip connection layer, then the network convergence is accelerated and the parameter utilization is improved. Secondly, the standard convolutions are replaced by the depth-wise separable convolutions, thus expanding the network and reducing the number of the parameters. Thirdly, the residual blocks are employed to mitigate the gradient disappearance and the explosion. Fourthly, during the proposed algorithm, each skip connection follows Squeeze-and-Excitation blocks so that the shallow features and deep features can be effectively fused through learning the interdependence of feature channel. Generally, the loss function of the conditional generative adversarial nets is modified to make the overall segmentation performance be optimal, while having strong global penalty ability in the whole game learning model. Finally, one experiment is carried out on the DRIVE dataset with image enhancement and data expansion. From the experiment results, the segmentation sensitivity reaches 87.18%, further the specificity, accuracy and AUC are 98.19%, 96.95% and 98.42% respectively, which show the overall performance and sensitivity are better than the existing algorithms.

APA, Harvard, Vancouver, ISO, and other styles

44

Morrison, Douglas, Peter Corke, and Jürgen Leitner. "Learning robust, real-time, reactive robotic grasping." International Journal of Robotics Research 39, no. 2-3 (June 26, 2019): 183–201. http://dx.doi.org/10.1177/0278364919859066.

Full text

Abstract:

We present a novel approach to perform object-independent grasp synthesis from depth images via deep neural networks. Our generative grasping convolutional neural network (GG-CNN) predicts a pixel-wise grasp quality that can be deployed in closed-loop grasping scenarios. GG-CNN overcomes shortcomings in existing techniques, namely discrete sampling of grasp candidates and long computation times. The network is orders of magnitude smaller than other state-of-the-art approaches while achieving better performance, particularly in clutter. We run a suite of real-world tests, during which we achieve an 84% grasp success rate on a set of previously unseen objects with adversarial geometry and 94% on household items. The lightweight nature enables closed-loop control of up to 50 Hz, with which we observed 88% grasp success on a set of household objects that are moved during the grasp attempt. We further propose a method combining our GG-CNN with a multi-view approach, which improves overall grasp success rate in clutter by 10%. Code is provided at https://github.com/dougsm/ggcnn

APA, Harvard, Vancouver, ISO, and other styles

45

Yang, Cheng, and Guanming Lu. "Deeply Recursive Low- and High-Frequency Fusing Networks for Single Image Super-Resolution." Sensors 20, no. 24 (December 18, 2020): 7268. http://dx.doi.org/10.3390/s20247268.

Full text

Abstract:

With the development of researches on single image super-resolution (SISR) based on convolutional neural networks (CNN), the quality of recovered images has been remarkably promoted. Since then, many deep learning-based models have been proposed, which have outperformed the traditional SISR algorithms. According to the results of extensive experiments, the feature representations of the model can be enhanced by increasing the depth and width of the network, which can ultimately improve the image reconstruction quality. However, a larger network generally consumes more computational and memory resources, making it difficult to train the network and increasing the prediction time. In view of the above problems, a novel deeply-recursive low- and high-frequency fusing network (DRFFN) for SISR tasks is proposed in this paper, which adopts the structure of parallel branches to extract the low- and high-frequency information of the image, respectively. The different complexities of the branches can reflect the frequency characteristic of the diverse image information. Moreover, an effective channel-wise attention mechanism based on variance (VCA) is designed to make the information distribution of each feature map more reasonably with different variances. Owing to model structure (i.e., cascading recursive learning of recursive units), DRFFN and DRFFN-L are very compact, where the weights are shared by all convolutional recursions. Comprehensive benchmark evaluations in standard benchmark datasets well demonstrate that DRFFN outperforms the most existing models and has achieved competitive, quantitative, and visual results.

APA, Harvard, Vancouver, ISO, and other styles

46

Mazhar, Osama, Sofiane Ramdani, and Andrea Cherubini. "A Deep Learning Framework for Recognizing Both Static and Dynamic Gestures." Sensors 21, no. 6 (March 23, 2021): 2227. http://dx.doi.org/10.3390/s21062227.

Full text

Abstract:

Intuitive user interfaces are indispensable to interact with the human centric smart environments. In this paper, we propose a unified framework that recognizes both static and dynamic gestures, using simple RGB vision (without depth sensing). This feature makes it suitable for inexpensive human-robot interaction in social or industrial settings. We employ a pose-driven spatial attention strategy, which guides our proposed Static and Dynamic gestures Network—StaDNet. From the image of the human upper body, we estimate his/her depth, along with the region-of-interest around his/her hands. The Convolutional Neural Network (CNN) in StaDNet is fine-tuned on a background-substituted hand gestures dataset. It is utilized to detect 10 static gestures for each hand as well as to obtain the hand image-embeddings. These are subsequently fused with the augmented pose vector and then passed to the stacked Long Short-Term Memory blocks. Thus, human-centred frame-wise information from the augmented pose vector and from the left/right hands image-embeddings are aggregated in time to predict the dynamic gestures of the performing person. In a number of experiments, we show that the proposed approach surpasses the state-of-the-art results on the large-scale Chalearn 2016 dataset. Moreover, we transfer the knowledge learned through the proposed methodology to the Praxis gestures dataset, and the obtained results also outscore the state-of-the-art on this dataset.

APA, Harvard, Vancouver, ISO, and other styles

47

Zhang, W., H. Huang, M. Schmitz, X. Sun, H. Wang, and H. Mayer. "A MULTI-RESOLUTION FUSION MODEL INCORPORATING COLOR AND ELEVATION FOR SEMANTIC SEGMENTATION." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-1/W1 (May 31, 2017): 513–17. http://dx.doi.org/10.5194/isprs-archives-xlii-1-w1-513-2017.

Full text

Abstract:

In recent years, the developments for Fully Convolutional Networks (FCN) have led to great improvements for semantic segmentation in various applications including fused remote sensing data. There is, however, a lack of an in-depth study inside FCN models which would lead to an understanding of the contribution of individual layers to specific classes and their sensitivity to different types of input data. In this paper, we address this problem and propose a fusion model incorporating infrared imagery and Digital Surface Models (DSM) for semantic segmentation. The goal is to utilize heterogeneous data more accurately and effectively in a single model instead of to assemble multiple models. First, the contribution and sensitivity of layers concerning the given classes are quantified by means of their recall in FCN. The contribution of different modalities on the pixel-wise prediction is then analyzed based on visualization. Finally, an optimized scheme for the fusion of layers with color and elevation information into a single FCN model is derived based on the analysis. Experiments are performed on the ISPRS Vaihingen 2D Semantic Labeling dataset. Comprehensive evaluations demonstrate the potential of the proposed approach.

APA, Harvard, Vancouver, ISO, and other styles

48

Ophoff, Tanguy, Cédric Gullentops, Kristof Van Beeck, and Toon Goedemé. "Investigating the Potential of Network Optimization for a Constrained Object Detection Problem." Journal of Imaging 7, no. 4 (April 1, 2021): 64. http://dx.doi.org/10.3390/jimaging7040064.

Full text

Abstract:

Object detection models are usually trained and evaluated on highly complicated, challenging academic datasets, which results in deep networks requiring lots of computations. However, a lot of operational use-cases consist of more constrained situations: they have a limited number of classes to be detected, less intra-class variance, less lighting and background variance, constrained or even fixed camera viewpoints, etc. In these cases, we hypothesize that smaller networks could be used without deteriorating the accuracy. However, there are multiple reasons why this does not happen in practice. Firstly, overparameterized networks tend to learn better, and secondly, transfer learning is usually used to reduce the necessary amount of training data. In this paper, we investigate how much we can reduce the computational complexity of a standard object detection network in such constrained object detection problems. As a case study, we focus on a well-known single-shot object detector, YoloV2, and combine three different techniques to reduce the computational complexity of the model without reducing its accuracy on our target dataset. To investigate the influence of the problem complexity, we compare two datasets: a prototypical academic (Pascal VOC) and a real-life operational (LWIR person detection) dataset. The three optimization steps we exploited are: swapping all the convolutions for depth-wise separable convolutions, perform pruning and use weight quantization. The results of our case study indeed substantiate our hypothesis that the more constrained a problem is, the more the network can be optimized. On the constrained operational dataset, combining these optimization techniques allowed us to reduce the computational complexity with a factor of 349, as compared to only a factor 9.8 on the academic dataset. When running a benchmark on an Nvidia Jetson AGX Xavier, our fastest model runs more than 15 times faster than the original YoloV2 model, whilst increasing the accuracy by 5% Average Precision (AP).

APA, Harvard, Vancouver, ISO, and other styles

49

Moe, Yngve Mardal, Aurora Rosvoll Groendahl, Oliver Tomic, Einar Dale, Eirik Malinen, and Cecilia Marie Futsaether. "Deep learning-based auto-delineation of gross tumour volumes and involved nodes in PET/CT images of head and neck cancer patients." European Journal of Nuclear Medicine and Molecular Imaging 48, no. 9 (February 9, 2021): 2782–92. http://dx.doi.org/10.1007/s00259-020-05125-x.

Full text

Abstract:

Abstract Purpose Identification and delineation of the gross tumour and malignant nodal volume (GTV) in medical images are vital in radiotherapy. We assessed the applicability of convolutional neural networks (CNNs) for fully automatic delineation of the GTV from FDG-PET/CT images of patients with head and neck cancer (HNC). CNN models were compared to manual GTV delineations made by experienced specialists. New structure-based performance metrics were introduced to enable in-depth assessment of auto-delineation of multiple malignant structures in individual patients. Methods U-Net CNN models were trained and evaluated on images and manual GTV delineations from 197 HNC patients. The dataset was split into training, validation and test cohorts (n= 142, n = 15 and n = 40, respectively). The Dice score, surface distance metrics and the new structure-based metrics were used for model evaluation. Additionally, auto-delineations were manually assessed by an oncologist for 15 randomly selected patients in the test cohort. Results The mean Dice scores of the auto-delineations were 55%, 69% and 71% for the CT-based, PET-based and PET/CT-based CNN models, respectively. The PET signal was essential for delineating all structures. Models based on PET/CT images identified 86% of the true GTV structures, whereas models built solely on CT images identified only 55% of the true structures. The oncologist reported very high-quality auto-delineations for 14 out of the 15 randomly selected patients. Conclusions CNNs provided high-quality auto-delineations for HNC using multimodality PET/CT. The introduced structure-wise evaluation metrics provided valuable information on CNN model strengths and weaknesses for multi-structure auto-delineation.

APA, Harvard, Vancouver, ISO, and other styles

50

Ayala, Christian, Carlos Aranda, and Mikel Galar. "Multi-Class Strategies for Joint Building Footprint and Road Detection in Remote Sensing." Applied Sciences 11, no. 18 (September 8, 2021): 8340. http://dx.doi.org/10.3390/app11188340.

Full text

Abstract:

Building footprints and road networks are important inputs for a great deal of services. For instance, building maps are useful for urban planning, whereas road maps are essential for disaster response services. Traditionally, building and road maps are manually generated by remote sensing experts or land surveying, occasionally assisted by semi-automatic tools. In the last decade, deep learning-based approaches have demonstrated their capabilities to extract these elements automatically and accurately from remote sensing imagery. The building footprint and road network detection problem can be considered a multi-class semantic segmentation task, that is, a single model performs a pixel-wise classification on multiple classes, optimizing the overall performance. However, depending on the spatial resolution of the imagery used, both classes may coexist within the same pixel, drastically reducing their separability. In this regard, binary decomposition techniques, which have been widely studied in the machine learning literature, are proved useful for addressing multi-class problems. Accordingly, the multi-class problem can be split into multiple binary semantic segmentation sub-problems, specializing different models for each class. Nevertheless, in these cases, an aggregation step is required to obtain the final output labels. Additionally, other novel approaches, such as multi-task learning, may come in handy to further increase the performance of the binary semantic segmentation models. Since there is no certainty as to which strategy should be carried out to accurately tackle a multi-class remote sensing semantic segmentation problem, this paper performs an in-depth study to shed light on the issue. For this purpose, open-access Sentinel-1 and Sentinel-2 imagery (at 10 m) are considered for extracting buildings and roads, making use of the well-known U-Net convolutional neural network. It is worth stressing that building and road classes may coexist within the same pixel when working at such a low spatial resolution, setting a challenging problem scheme. Accordingly, a robust experimental study is developed to assess the benefits of the decomposition strategies and their combination with a multi-task learning scheme. The obtained results demonstrate that decomposing the considered multi-class remote sensing semantic segmentation problem into multiple binary ones using a One-vs.-All binary decomposition technique leads to better results than the standard direct multi-class approach. Additionally, the benefits of using a multi-task learning scheme for pushing the performance of binary segmentation models are also shown.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!