Log in

Relevant bibliographies by topics / Region Proposal Network (RPN) / Journal articles

To see the other types of publications on this topic, follow the link: Region Proposal Network (RPN).

Journal articles on the topic 'Region Proposal Network (RPN)'

Author: Grafiati

Published: 5 June 2025

Last updated: 8 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Region Proposal Network (RPN).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Liu, Gang, and Chuyi Wang. "A Novel Multi-Scale Feature Fusion Method for Region Proposal Network in Fast Object Detection." International Journal of Data Warehousing and Mining 16, no. 3 (2020): 132–45. http://dx.doi.org/10.4018/ijdwm.2020070107.

Full text

Abstract:

Neural network models have been widely used in the field of object detecting. The region proposal methods are widely used in the current object detection networks and have achieved well performance. The common region proposal methods hunt the objects by generating thousands of the candidate boxes. Compared to other region proposal methods, the region proposal network (RPN) method improves the accuracy and detection speed with several hundred candidate boxes. However, since the feature maps contains insufficient information, the ability of RPN to detect and locate small-sized objects is poor. A novel multi-scale feature fusion method for region proposal network to solve the above problems is proposed in this article. The proposed method is called multi-scale region proposal network (MS-RPN) which can generate suitable feature maps for the region proposal network. In MS-RPN, the selected feature maps at multiple scales are fine turned respectively and compressed into a uniform space. The generated fusion feature maps are called refined fusion features (RFFs). RFFs incorporate abundant detail information and context information. And RFFs are sent to RPN to generate better region proposals. The proposed approach is evaluated on PASCAL VOC 2007 and MS COCO benchmark tasks. MS-RPN obtains significant improvements over the comparable state-of-the-art detection models.

APA, Harvard, Vancouver, ISO, and other styles

2

Zhang, Ximing, Shujuan Luo, and Xuewu Fan. "Proposal-Based Visual Tracking Using Spatial Cascaded Transformed Region Proposal Network." Sensors 20, no. 17 (2020): 4810. http://dx.doi.org/10.3390/s20174810.

Full text

Abstract:

Region proposal network (RPN) based trackers employ the classification and regression block to generate the proposals, the proposal that contains the highest similarity score is formulated to be the groundtruth candidate of next frame. However, region proposal network based trackers cannot make the best of the features from different convolutional layers, and the original loss function cannot alleviate the data imbalance issue of the training procedure. We propose the Spatial Cascaded Transformed RPN to combine the RPN and STN (spatial transformer network) together, in order to successfully obtain the proposals of high quality, which can simultaneously improves the robustness. The STN can transfer the spatial transformed features though different stages, which extends the spatial representation capability of such networks handling complex scenarios such as scale variation and affine transformation. We break the restriction though an easy samples penalization loss (shrinkage loss) instead of smooth L1 function. Moreover, we perform the multi-cue proposals re-ranking to guarantee the accuracy of the proposed tracker. We extensively prove the effectiveness of our proposed method on the ablation studies of the tracking datasets, which include OTB-2015 (Object Tracking Benchmark 2015), VOT-2018 (Visual Object Tracking 2018), LaSOT (Large Scale Single Object Tracking), TrackingNet (A Large-Scale Dataset and Benchmark for Object Tracking in the Wild) and UAV123 (UAV Tracking Dataset).

APA, Harvard, Vancouver, ISO, and other styles

3

Dong, Ruchan, Licheng Jiao, Yan Zhang, Jin Zhao, and Weiyan Shen. "A Multi-Scale Spatial Attention Region Proposal Network for High-Resolution Optical Remote Sensing Imagery." Remote Sensing 13, no. 17 (2021): 3362. http://dx.doi.org/10.3390/rs13173362.

Full text

Abstract:

Deep convolutional neural networks (DCNNs) are driving progress in object detection of high-resolution remote sensing images. Region proposal generation, as one of the key steps in object detection, has also become the focus of research. High-resolution remote sensing images usually contain various sizes of objects and complex background, small objects are easy to miss or be mis-identified in object detection. If the recall rate of region proposal of small objects and multi-scale objects can be improved, it will bring an improvement on the performance of the accuracy in object detection. Spatial attention is the ability to focus on local features in images and can improve the learning efficiency of DCNNs. This study proposes a multi-scale spatial attention region proposal network (MSA-RPN) for high-resolution optical remote sensing imagery. The MSA-RPN is an end-to-end deep learning network with a backbone network of ResNet. It deploys three novel modules to fulfill its task. First, the Scale-specific Feature Gate (SFG) focuses on features of objects by processing multi-scale features extracted from the backbone network. Second, the spatial attention-guided model (SAGM) obtains spatial information of objects from the multi-scale attention maps. Third, the Selective Strong Attention Maps Model (SSAMM) adaptively selects sliding windows according to the loss values from the system’s feedback, and sends the windowed samples to the spatial attention decoder. Finally, the candidate regions and their corresponding confidences can be obtained. We evaluate the proposed network in a public dataset LEVIR and compare with several state-of-the-art methods. The proposed MSA-RPN yields a higher recall rate of region proposal generation, especially for small targets in remote sensing images.

APA, Harvard, Vancouver, ISO, and other styles

4

Rahmani, K., and H. Mayer. "HIGH QUALITY FACADE SEGMENTATION BASED ON STRUCTURED RANDOM FOREST, REGION PROPOSAL NETWORK AND RECTANGULAR FITTING." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences IV-2 (May 28, 2018): 223–30. http://dx.doi.org/10.5194/isprs-annals-iv-2-223-2018.

Full text

Abstract:

In this paper we present a pipeline for high quality semantic segmentation of building facades using Structured Random Forest (SRF), Region Proposal Network (RPN) based on a Convolutional Neural Network (CNN) as well as rectangular fitting optimization. Our main contribution is that we employ features created by the RPN as channels in the SRF.We empirically show that this is very effective especially for doors and windows. Our pipeline is evaluated on two datasets where we outperform current state-of-the-art methods. Additionally, we quantify the contribution of the RPN and the rectangular fitting optimization on the accuracy of the result.

APA, Harvard, Vancouver, ISO, and other styles

5

Wang, Yanke, Qidan Zhu, Wenchang Nie, and Hong Xiao. "Do tracking by clustering anchors output from region proposal network." MATEC Web of Conferences 246 (2018): 03006. http://dx.doi.org/10.1051/matecconf/201824603006.

Full text

Abstract:

Most existing clustering algorithms suffer from the computation of similarity function and the representation of each object. In this paper, we propose a clustering tracker based on region proposal network (RPN-C) to do tracking by clustering anchors output by region proposal network into potential centers. We first cut off the second part of Faster RCNN and then cast clustering algorithms in feature space of anchors, including K-Means, mean shift and density peak clustering strategy in terms of anchors’ centroid and scale information. Without fully connected layers, the RPN-C tracker can lower the computational cost up to 60% and still, it can effectively maintain an accurate prediction for the localization in next frame. To evaluate the robustness of this tracker, we establish a dataset containing over 2000 training images and 7 testing sequences of 8 kinds of fruits. The experimental results on our own datasets demonstrate that the proposed tracker performs excellently both in location of object and the decision of scale and has a strong advantage of stability in the context of occlusion and complicated background.

APA, Harvard, Vancouver, ISO, and other styles

6

J, Srilatha, S. Subashini T, and Vaidehi K. "Solid Waste Detection and Recognition using Faster RCNN." Indian Journal of Science and Technology 16, no. 42 (2023): 3778–85. https://doi.org/10.17485/IJST/v16i42.2005.

Full text

Abstract:

Abstract <strong>Objective:</strong> To develop a two-stage object detection method based on convolutional neural networks (CNNs) to identify and classify solid waste, contributing to the creation of intelligent systems for society. <strong>Methods:</strong> The study utilizes a base network, ResNet 101, to generate convolution feature maps. In the first stage, a Region Proposal Network (RPN) is created on top of these convolution features, producing 256-dimensional feature vectors, objectness scores, and bounding rectangles for different anchor boxes. In the next stage, the region proposals are used to train a softmax layer and regressor, enabling the classification and localization of five types of solid waste, namely cardboard, glass, metal, paper and plastic. <strong>Findings:</strong> The proposed Faster RCNN demonstrates nearly real-time object detection rates. Experimental results reveal that the Faster RCNN with ResNet 101 and RPN achieves an accuracy of 96.7%, outperforming the Faster RCNN with a simple CNN, which achieves an accuracy of 86.7%. <strong>Novelty:</strong> Unlike traditional R-CNN, which relies on computationally inefficient selective search, the proposed Faster RCNN employs RPN, a small neural network sliding on the last convolution layer's feature map, predicting object presence and bounding boxes. This approach significantly improves efficiency compared to the exhaustive examination in R-CNN's selective search. <strong>Keywords</strong>: Object Detection, RCNN, Fast RCNN, Faster RCNN, RPN, ROI pooling

APA, Harvard, Vancouver, ISO, and other styles

7

He, Dongfang, Jiajun Wen, and Zhihui Lai. "Textile Fabric Defect Detection Based on Improved Faster R-CNN." AATCC Journal of Research 8, no. 1_suppl (2021): 82–90. http://dx.doi.org/10.14504/ajr.8.s1.11.

Full text

Abstract:

To identify and locate industrial textile defects accurately, this study proposes a textile detection model based on a convolution neural network (CNN) known as Faster R-CNN. First, a textile defect feature map was extracted by ResNet-101 deep convolution network. Faster R-CNN only extracts features from the last layer of the feature map, which leads to a loss of low-level location information. The proposed method adds the feature pyramid network (FPN) to the network architecture to make an independent prediction for each level in the feature extraction stage. The extracted feature map is input into the regional proposal network, among which the overlapping regional proposals are suppressed. The proposed improved Faster R-CNN model with Region Proposal Network (RPN), Soft Non-Maximum Suppression (NMS), and Region of Interest (ROI) Align can achieve a detection accuracy of 98% and an mean of Average Precision (mAP) of 85%, which is more competitive than the state-of-the-art deep learning-based object detection algorithms.

APA, Harvard, Vancouver, ISO, and other styles

8

Wang, Rujing, Lin Jiao, Chengjun Xie, Peng Chen, Jianming Du, and Rui Li. "S-RPN: Sampling-balanced region proposal network for small crop pest detection." Computers and Electronics in Agriculture 187 (August 2021): 106290. http://dx.doi.org/10.1016/j.compag.2021.106290.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Li, Danhua, Xiaofeng Di, Xuan Qu, Yunfei Zhao, and Honggang Kong. "Deep Convolutional Neural Network for Pedestrian Detection with Multi-Levels Features Fusion." MATEC Web of Conferences 232 (2018): 01061. http://dx.doi.org/10.1051/matecconf/201823201061.

Full text

Abstract:

Pedestrian detection aims to localize and recognize every pedestrian instance in an image with a bounding box. The current state-of-the-art method is Faster RCNN, which is such a network that uses a region proposal network (RPN) to generate high quality region proposals, while Fast RCNN is used to classifiers extract features into corresponding categories. The contribution of this paper is integrated low-level features and high-level features into a Faster RCNN-based pedestrian detection framework, which efficiently increase the capacity of the feature. Through our experiments, we comprehensively evaluate our framework, on the Caltech pedestrian detection benchmark and our methods achieve state-of-the-art accuracy and present a competitive result on Caltech dataset.

APA, Harvard, Vancouver, ISO, and other styles

10

Rana, Aayush Jung, and Yogesh S. Rawat. "SSA2D: Single Shot Actor-Action Detection in Videos (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (2021): 15875–76. http://dx.doi.org/10.1609/aaai.v35i18.17934.

Full text

Abstract:

We propose a single-shot approach for actor-action detection in videos. The existing approaches use a two-step process, which rely on Region Proposal Network (RPN), where the action is estimated based on the detected proposals followed by post-processing such as non-maximal suppression. While effective in terms of performance, these methods pose limitations in scalability for dense video scenes with a high memory requirement for thousand of proposals, which leads to slow processing time. We propose SSA2D, a unified end-to-end deep network, which performs joint actor-action detection in a single-shot without the need of any proposals and post-processing, making it memory as well as time efficient.

APA, Harvard, Vancouver, ISO, and other styles

11

Mekhala, Sri Devi Sameera, Yogitha Sindhu Bachu, Pragathi Guddanti, Hema Sai Chandu Guttikonda, Sneha Sree Vemulapalli, and Yasmitha Devarapalli. "Detection and Classification of Tumors in Brain based on the Location by using MRI." International Journal for Research in Applied Science and Engineering Technology 12, no. 3 (2024): 1110–14. http://dx.doi.org/10.22214/ijraset.2024.58702.

Full text

Abstract:

Abstract: Brain tumor is a serious disease occurring in human being. Medical treatment process mainly depends on tumor types and its location. The final decision of neuro specialist and radiologist for the tumor diagnosis mainly depend on evaluation of MRI (Magnetic Resonance Imaging) images. To overcome this, Faster R-CNN deep learning algorithm was proposed for detecting the tumor and marking the area of their occurrence with Region Proposal Network (RPN). The selected MR image dataset consists of three primary brain tumors namely glioma, meningioma and pituitary. The proposed algorithm uses VGG-19 architecture as a base layer for both the region proposal network and the classifier network.

APA, Harvard, Vancouver, ISO, and other styles

12

Deng, Xiaoling, Zejing Tong, Yubin Lan, and Zixiao Huang. "Detection and Location of Dead Trees with Pine Wilt Disease Based on Deep Learning and UAV Remote Sensing." AgriEngineering 2, no. 2 (2020): 294–307. http://dx.doi.org/10.3390/agriengineering2020019.

Full text

Abstract:

Pine wilt disease causes huge economic losses to pine wood forestry because of its destructiveness and rapid spread. This paper proposes a detection and location method of pine wood nematode disease at a large scale adopting UAV (Unmanned Aerial Vehicle) remote sensing and artificial intelligence technology. The UAV remote sensing images were enhanced by computer vision tools. A Faster-RCNN (Faster Region Convolutional Neural Networks) deep learning framework based on a RPN (Region Proposal Network) network and the ResNet residual neural network were used to train the pine wilt diseased dead tree detection model. The loss function and the anchors in the RPN of the convolutional neural network were optimized. Finally, the location of pine wood nematode dead tree was conducted, which generated the geographic information on the detection results. The results show that ResNet101 performed better than VGG16 (Visual Geometry Group 16) convolutional neural network. The detection accuracy was improved and reached to about 90% after a series of optimizations to the network, meaning that the optimization methods proposed in this paper are feasible to pine wood nematode dead tree detection.

APA, Harvard, Vancouver, ISO, and other styles

13

Xu, Hengxin, Lei Yang, Shengya Zhao, Shan Tao, Xinran Tian, and Kun Liu. "SPS-RCNN: Semantic-Guided Proposal Sampling for 3D Object Detection from LiDAR Point Clouds." Sensors 25, no. 4 (2025): 1064. https://doi.org/10.3390/s25041064.

Full text

Abstract:

Three-dimensional object detection using LiDAR has attracted significant attention due to its resilience to lighting conditions and ability to capture detailed geometric information. However, existing methods still face challenges, such as a high proportion of background points in the sampled point set and limited accuracy in detecting distant objects. To address these issues, we propose semantic-guided proposal sampling-RCNN (SPS-RCNN), a multi-stage detection framework based on point–voxel fusion. The framework comprises three components: a voxel-based region proposal network (RPN), a keypoint sampling stream (KSS), and a progressive refinement network (PRN). In the KSS, we propose a novel semantic-guided proposal sampling (SPS) method, which increases the proportion of foreground points and enhances sensitivity to outliers through multilevel sampling that integrates proposal-based local sampling and semantic-guided global sampling. In the PRN, a cascade attention module (CAM) is employed to aggregate features from multiple subnets, progressively refining region proposals to improve detection accuracy for medium- and long-range objects. Comprehensive experiments on the widely used KITTI dataset demonstrate that SPS-RCNN improves detection accuracy and exhibits enhanced robustness across categories compared to the baseline.

APA, Harvard, Vancouver, ISO, and other styles

14

Zhuo, Li, Bin Liu, Hui Zhang, Shiyu Zhang, and Jiafeng Li. "MultiRPN-DIDNet: Multiple RPNs and Distance-IoU Discriminative Network for Real-Time UAV Target Tracking." Remote Sensing 13, no. 14 (2021): 2772. http://dx.doi.org/10.3390/rs13142772.

Full text

Abstract:

Target tracking in low-altitude Unmanned Aerial Vehicle (UAV) videos faces many technical challenges due to the relatively small sizes, various orientation changes of the objects and diverse scenes. As a result, the tracking performance is still not satisfactory. In this paper, we propose a real-time single-target tracking method with multiple Region Proposal Networks (RPNs) and Distance-Intersection-over-Union (Distance-IoU) Discriminative Network (DIDNet), namely MultiRPN-DIDNet, in which ResNet50 is used as the backbone network for feature extraction. Firstly, an instance-based RPN suitable for the target tracking task is constructed under the framework of Simases Neural Network. RPN is to perform bounding box regression and classification, in which channel attention mechanism is integrated to improve the representative capability of the deep features. The RPNs built on the Block 2, Block 3 and Block 4 of ResNet50 output their own Regression (Reg) coefficients and Classification scores (Cls) respectively, which are weighted and then fused to determine the high-quality region proposals. Secondly, a DIDNet is designed to correct the candidate target’s bounding box finely through the fusion of multi-layer features, which is trained with the Distance-IoU loss. Experimental results on the public datasets of UAV20L and DTB70 show that, compared with the state-of-the-art UAV trackers, the proposed MultiRPN-DIDNet can obtain better tracking performance with fewer region proposals and correction iterations. As a result, the tracking speed has reached 33.9 frames per second (FPS), which can meet the requirements of real-time tracking tasks.

APA, Harvard, Vancouver, ISO, and other styles

15

Shi, Yundong, Huimin Wang, Chao Jing, and Xingzhong Zhang. "A Few-Shot Defect Detection Method for Transmission Lines Based on Meta-Attention and Feature Reconstruction." Applied Sciences 13, no. 10 (2023): 5896. http://dx.doi.org/10.3390/app13105896.

Full text

Abstract:

In tasks of transmission line defect detection, traditional object detection algorithms are ineffective, with few training samples of defective components. Meta-learning uses multi-task learning as well as fine-tuning to learn common features in different tasks, which has the ability to adapt to new tasks quickly, shows good performance in few-shot object detection, and has good generalization in new tasks. For this reason, we proposed a few-shot defect detection method (Meta PowerNet) with a Meta-attention RPN and Feature Reconstruction Module for transmission lines based on meta-learning. First, in the stage of region proposal, a new region proposal network (Meta-Attention Region Proposal Network, MA-RPN) is designed to fuse the support set features and the query set features to filter the noise in anchor boxes. In addition, it has the ability to focus on the subtle texture features of smaller-sized objects by fusing low-level features from the query set. Second, in the meta-feature construction stage, we designed a meta-learner with the defect feature reconstruction module as the core to capture and focus on the defect-related feature channels. The experimental results show that under the condition, there are only 30 training objects for various types of component defects. The method achieves 72.5% detection accuracy for component defects, which is a significant improvement compared with other mainstream few-shot object detection. Meanwhile, the MA-RPN designed in this paper can be used in other meta-learning object detection models universally.

APA, Harvard, Vancouver, ISO, and other styles

16

Hachchane, Imane, Abdelmajid Badri, Aïcha Sahel, and Yassine Ruichek. "Large-scale image-to-video face retrieval with convolutional neural network features." IAES International Journal of Artificial Intelligence (IJ-AI) 9, no. 1 (2020): 40. http://dx.doi.org/10.11591/ijai.v9.i1.pp40-45.

Full text

Abstract:

Convolutional neural network features are becoming the norm in instance retrieval. This work investigates the relevance of using an of the shelf object detection network, like Faster R-CNN, as a feature extractor for an image-to-video face retrieval pipeline instead of using hand-crafted features. We use the objects proposals learned by a Region Proposal Network (RPN) and their associated representations taken from a CNN for the filtering and the re-ranking steps. Moreover, we study the relevance of features from a finetuned network. In addition to that we explore the use of face detection, fisher vector and bag of visual words with those CNN features. We also test the impact of different similarity metrics. The results obtained are very promising.

APA, Harvard, Vancouver, ISO, and other styles

17

Imane, Hachchane, Badri Abdelmajid, Sahel Aïcha, and Ruichek Yassine. "Large-scale image-to-video face retrieval with convolutional neural network features." International Journal of Artificial Intelligence (IJ-AI) 9, no. 1 (2020): 40–45. https://doi.org/10.11591/ijai.v9.i1.pp40-45.

Full text

Abstract:

Convolutional neural network features are becoming the norm in instance retrieval. This work investigates the relevance of using an of the shelf object detection network, like Faster R-CNN, as a feature extractor for an image-tovideo face retrieval pipeline instead of using hand-crafted features. We use the objects proposals learned by a Region Proposal Network (RPN) and their associated representations taken from a CNN for the filtering and the reranking steps. Moreover, we study the relevance of features from a finetuned network. In addition to that we explore the use of face detection, fisher vector and bag of visual words with those same CNN features. We also test the impact of different similarity metrics. The results obtained are very promising.

APA, Harvard, Vancouver, ISO, and other styles

18

J, Samson Immanuel, Manoj G, and Divya P. S. "Performance Metric Estimation of Fast RCNN with VGG-16 Architecture for Emotional Recognition." International Journal of Applied Mathematics, Computational Science and Systems Engineering 4 (June 25, 2022): 30–38. http://dx.doi.org/10.37394/232026.2022.4.4.

Full text

Abstract:

Faster R-CNN is a state-of-the-art universal object detection approach based on a convolutional neural network that offers object limits and objectness scores at each location in an image at the same time. To hypothesis object locations, state-of-the-art object detection networks rely on region proposal techniques. The accuracy of ML/DL models has been shown to be limited in the past due to a range of issues, including wavelength selection, spatial resolution, and hyper parameter selection and tuning. The goal of this study is to create a new automated emotional detection system based on the CK+ database. Fast R-CNN has lowered the detection network’s operating time, revealing region proposal computation as a bottleneck. We develop a Region Proposal Network (RPN) in this paper that shares full-image convolutional features with the detection network, allowing for almost cost-free region suggestions. The suggested VGG-16 Fast RCNN model obtained user accuracy close to 100 percent in the emotion class, followed by VGG-16 (99.79 percent), Alexnet (98.58 percent), and Googlenet (98.58 percent) (98.32 percent). After extensive hyper parameter tuning for emotional recognition, the generated Fast RCNN VGG-16 model showed an overall accuracy of 99.79 percent, far higher than previously published results.

APA, Harvard, Vancouver, ISO, and other styles

19

Ding, Xintao, Boquan Li, and Jinbao Wang. "Geometric property-based convolutional neural network for indoor object detection." International Journal of Advanced Robotic Systems 18, no. 1 (2021): 172988142199332. http://dx.doi.org/10.1177/1729881421993323.

Full text

Abstract:

Indoor object detection is a very demanding and important task for robot applications. Object knowledge, such as two-dimensional (2D) shape and depth information, may be helpful for detection. In this article, we focus on region-based convolutional neural network (CNN) detector and propose a geometric property-based Faster R-CNN method (GP-Faster) for indoor object detection. GP-Faster incorporates geometric property in Faster R-CNN to improve the detection performance. In detail, we first use mesh grids that are the intersections of direct and inverse proportion functions to generate appropriate anchors for indoor objects. After the anchors are regressed to the regions of interest produced by a region proposal network (RPN-RoIs), we then use 2D geometric constraints to refine the RPN-RoIs, in which the 2D constraint of every classification is a convex hull region enclosing the width and height coordinates of the ground-truth boxes on the training set. Comparison experiments are implemented on two indoor datasets SUN2012 and NYUv2. Since the depth information is available in NYUv2, we involve depth constraints in GP-Faster and propose 3D geometric property-based Faster R-CNN (DGP-Faster) on NYUv2. The experimental results show that both GP-Faster and DGP-Faster increase the performance of the mean average precision.

APA, Harvard, Vancouver, ISO, and other styles

20

Dai, Wenxin, Yuqing Mao, Rongao Yuan, Yijing Liu, Xuemei Pu, and Chuan Li. "A Novel Detector Based on Convolution Neural Networks for Multiscale SAR Ship Detection in Complex Background." Sensors 20, no. 9 (2020): 2547. http://dx.doi.org/10.3390/s20092547.

Full text

Abstract:

Convolution neural network (CNN)-based detectors have shown great performance on ship detections of synthetic aperture radar (SAR) images. However, the performance of current models has not been satisfactory enough for detecting multiscale ships and small-size ones in front of complex backgrounds. To address the problem, we propose a novel SAR ship detector based on CNN, which consist of three subnetworks: the Fusion Feature Extractor Network (FFEN), Region Proposal Network (RPN), and Refine Detection Network (RDN). Instead of using a single feature map, we fuse feature maps in bottom–up and top–down ways and generate proposals from each fused feature map in FFEN. Furthermore, we further merge features generated by the region-of-interest (RoI) pooling layer in RDN. Based on the feature representation strategy, the CNN framework constructed can significantly enhance the location and semantics information for the multiscale ships, in particular for the small ships. On the other hand, the residual block is introduced to increase the network depth, through which the detection precision could be further improved. The public SAR ship dataset (SSDD) and China Gaofen-3 satellite SAR image are used to validate the proposed method. Our method shows excellent performance for detecting the multiscale and small-size ships with respect to some competitive models and exhibits high potential in practical application.

APA, Harvard, Vancouver, ISO, and other styles

21

Zhu, Ye, Xiaoqian Shen, Shikun Liu, Xiaoli Zhang, and Gang Yan. "Image Splicing Location Based on Illumination Maps and Cluster Region Proposal Network." Applied Sciences 11, no. 18 (2021): 8437. http://dx.doi.org/10.3390/app11188437.

Full text

Abstract:

Splicing is the most common operation in image forgery, where the tampered background regions are imported from different images. Illumination maps are inherent attribute of images and provide significant clues when searching for splicing locations. This paper proposes an end-to-end dual-stream network for splicing location, where the illumination stream, which includes Grey-Edge (GE) and Inverse-Intensity Chromaticity (IIC), extract the inconsistent features, and the image stream extracts the global unnatural tampered features. The dual-stream feature in our network is fused through Multiple Feature Pyramid Network (MFPN), which contains richer context information. Finally, a Cluster Region Proposal Network (C-RPN) with spatial attention and an adaptive cluster anchor are proposed to generate potential tampered regions with greater retention of location information. Extensive experiments, which were evaluated on the NIST16 and CASIA standard datasets, show that our proposed algorithm is superior to some state-of-the-art algorithms, because it achieves accurate tampered locations at the pixel level, and has great robustness in post-processing operations, such as noise, blur and JPEG recompression.

APA, Harvard, Vancouver, ISO, and other styles

22

Shao, Faming, Xinqing Wang, Fanjie Meng, Jingwei Zhu, Dong Wang, and Juying Dai. "Improved Faster R-CNN Traffic Sign Detection Based on a Second Region of Interest and Highly Possible Regions Proposal Network." Sensors 19, no. 10 (2019): 2288. http://dx.doi.org/10.3390/s19102288.

Full text

Abstract:

Traffic sign detection systems provide important road control information for unmanned driving systems or auxiliary driving. In this paper, the Faster region with a convolutional neural network (R-CNN) for traffic sign detection in real traffic situations has been systematically improved. First, a first step region proposal algorithm based on simplified Gabor wavelets (SGWs) and maximally stable extremal regions (MSERs) is proposed. In this way, the region proposal a priori information is obtained and will be used for improving the Faster R-CNN. This part of our method is named as the highly possible regions proposal network (HP-RPN). Second, in order to solve the problem that the Faster R-CNN cannot effectively detect small targets, a method that combines the features of the third, fourth, and fifth layers of VGG16 to enrich the features of small targets is proposed. Third, the secondary region of interest method to enhance the feature of detection objects and improve the classification capability of the Faster R-CNN is proposed. Finally, a method of merging the German traffic sign detection benchmark (GTSDB) and Chinese traffic sign dataset (CTSD) databases into one larger database to increase the number of database samples is proposed. Experimental results show that our method improves the detection performance, especially for small targets.

APA, Harvard, Vancouver, ISO, and other styles

23

Hachchane, Imane, Abdelmajid Badri, Aïcha Sahel, Ilham Elmourabit, and Yassine Ruichek. "Image and video face retrieval with query image using convolutional neural network features." IAES International Journal of Artificial Intelligence (IJ-AI) 11, no. 1 (2022): 102. http://dx.doi.org/10.11591/ijai.v11.i1.pp102-109.

Full text

Abstract:

This paper addresses the issue of image and video face retrieval. The aim of this work is to be able to retrieve images and/or videos of specific person from a dataset of images and videos if we have a query image of that person. The methods proposed so far either focus on images or videos and use hand crafted features. In this work we built an end-to-end pipeline for both image and video face retrieval where we use convolutional neural network (CNN) features from an off-line feature extractor. And we exploit the object proposals learned by a region proposal network (RPN) in the online filtering and re-ranking steps. Moreover, we study the impact of finetuning the networks, the impact of sum-pooling and max-pooling, and the impact of different similarity metrics. The results that we were able to achieve are very promising.

APA, Harvard, Vancouver, ISO, and other styles

24

Imane, Hachchane, Badri Abdelmajid, Sahel Aïcha, Elmourabit Ilham, and Ruichek Yassine. "Image and video face retrieval with query image using convolutional neural network features." International Journal of Artificial Intelligence (IJ-AI) 11, no. 1 (2022): 102–9. https://doi.org/10.11591/ijai.v11.i1.pp102-109.

Full text

Abstract:

This paper addresses the issue of image and video face retrieval. The aim of this work is to be able to retrieve images and/or videos of specific person from a dataset of images and videos if we have a query image of that person. The methods proposed so far either focus on images or videos and use hand crafted features. In this work we built an end-to-end pipeline for both image and video face retrieval where we use convolutional neural network (CNN) features from an off-line feature extractor. And we exploit the object proposals learned by a region proposal network (RPN) in the online filtering and re-ranking steps. Moreover, we study the impact of finetuning the networks, the impact of sum-pooling and max-pooling, and the impact of different similarity metrics. The results that we were able to achieve are very promising.

APA, Harvard, Vancouver, ISO, and other styles

25

Yang, Senlin, Xin Chong, Xilong Li, and Ruixing Li. "Intelligent Intersection Vehicle and Pedestrian Detection Based on Convolutional Neural Network." Journal of Sensors 2022 (March 11, 2022): 1–11. http://dx.doi.org/10.1155/2022/8445816.

Full text

Abstract:

The preprocessed images are input to a pretrained neural network to obtain the corresponding feature mapping, and the corresponding region of interest is set for each point in the feature mapping to obtain multiple candidate feature regions; subsequently, these candidate feature regions are fed into a region proposal network and a deep residual network for binary classification and BB regression, and some of the candidate feature regions are filtered out, and the remaining feature regions are subjected to ROIAIign operation; finally, classification, BB regression, and mask generation are performed on these feature regions, and full convolutional nerve network operation is performed in each feature region and output. To further identify the specific model of the vehicle, this paper proposes a multifeature model recognition method that fuses the improved model with the optimized Mask R-CNN algorithm. A vehicle local feature dataset including vehicle badges, lights, air intake grille, and whole vehicle outline is established to simplify the network structure of model. Meanwhile, its detection frame generation process and the adjustment rules of overlapping frame confidence in nonmaximum suppression are improved for coarse vehicle localization. Then, the generated vehicle detection frames after localization are output to the Mask R-CNN algorithm after further optimizing the RPN structure. The localized vehicle detection frames are then output to the Mask R-CNN algorithm after further optimization of the RPN structure for local feature recognition, and good recognition results are achieved. Finally, this paper establishes a distributed server-based vehicle recognition system, which mainly includes database module, file module, feature extraction and matching module, message queue module, WEB module, and vehicle detection module. Due to the limitations of traditional region generation methods, this paper provides a brief analysis of the region generation network in the Faster R-CNN algorithm and details the loss calculation principle of the output layer.

APA, Harvard, Vancouver, ISO, and other styles

26

Kollapudi, Purnachand, Mydhili K. Nair, S. Parthiban, et al. "A Novel Faster RCNN with ODN-Based Rain Removal Technique." Mathematical Problems in Engineering 2022 (May 4, 2022): 1–11. http://dx.doi.org/10.1155/2022/4546135.

Full text

Abstract:

During rainy times, the impact of outdoor vision systems gets considerably decreased owing to the visibility barrier, distortion, and blurring instigated by raindrops. So, it is essential to eradicate it from the rainy images for ensuring the reliability of outdoor vision system. To achieve this, several rain removal studies have been performed in recent days. In this view, this paper presents a new Faster Region Convolutional Neural Network (Faster RCNN) with Optimal Densely Connected Networks (DenseNet)-based rain removal technique called FRCNN-ODN. The presented involves weighted mean filtering (WMF) is applied as a denoising technique, which helps to boost the quality of the input image. In addition, Faster RCNN technique is used for rain detection that comprises region proposal network (RPN) and Fast RCNN model. The RPN generates high quality region proposals that are exploited by the Faster RCNN to detect rain drops. Also, the DenseNet model is utilized as a baseline network to generate the feature map. Moreover, sparrow search optimization algorithm (SSOA) is applied to choose the hyperparameters of the DenseNet model namely learning rate, batch size, momentum, and weight decay. An extensive experimental validation process is performed to highlight the effectual outcome of the FRCNN-ODN model and investigated the results with respect to several dimensions. The FRCNN-ODN method produced a higher UIQI of 0.981 for the applied image 1. Furthermore, on the applied image 2, the FRCNN-ODN model achieved a maximum UIQI of 0.982. Furthermore, the FRCNN-ODN algorithm produced a higher UIQI of 0.998 on the applied image 3. The simulation outcome showcased the superior outcome of the FRCNN-ODN (Optimal Densely Connected Networks) model with existing methods in terms of distinct measures.

APA, Harvard, Vancouver, ISO, and other styles

27

Jacob, Benjamin, Heather McDonald, and Joe Bohn. "Closing the Gap on Addiction Recovery Engagement with an AI-infused Convolutional Neural Network Technology Application—A Design Vision." American Journal of Neural Networks and Applications 10, no. 1 (2024): 1–14. http://dx.doi.org/10.11648/j.ajnna.20241001.11.

Full text

Abstract:

Currently, real-time detection networks elaborate the technical details of the Faster Regional Convolution Neural Network (R-CNN) recognition pipeline. Within existing R-CNN literature, the evolution exhibited by R-CNN is most profound in terms of computational efficiency integrating each training stage to reduce test time and improvement in mean average precision (mAP), which can be infused into an artificially intelligent (AI), machine learning (ML), real-time, interactive, recovery capital application (app). This article introduces a Region Proposal Network (RPN) that shares full-image convolutional features with a real-time detection AI-ML infused network in an interactive, continuously self-learning wrist-wearable real-time recovery capital app for enabling cost-free region proposals (e.g., instantaneous body physiological responses, mapped connections to emergency services, sponsor, counselor, peer support, links to local and specific recovery capital assets, etc.). A fully merged RPN and Faster R-CNN deep convolutional unified network in the app can simultaneously train to aggregate and predict object bounds and objectness scores for implementing recovery capital real-time solutions (e.g., baseball card scoring dashboards, token-based incentive programs, etc.) A continuous training scheme alternates between fine-tuning RPN tasks (e.g., logging and updating personal client information, gamification orientation) and fine-tuning the detection (e.g., real-time biometric monitoring client’s behavior for self-awareness of when to connect with an addiction specialist or family member, quick response (QR) code registration for a 12-step program, advanced security encryption, etc.) in the interactive app. The very deep VGG-16 model detection system has a frame rate of 5fps within a graphic processing unit (GPU) while accomplishing sophisticated object detection accuracy on PASCAL Visual Object Classification Challenge (PASCAL VOC) and Microsoft Common Objects in Context (MS COCO) datasets. This is achieved with only 300 proposals per real-time retrieved data capture point, information bit or image. The app has real-time, infused cartographic and statistical tracking tools to generate Python Codes, which can enable a gamified addiction recovery-oriented digital conscience. Faster R-CNN and RPN can be the foundations of an interactive real-time recovery capital app that can be adaptable to multiple recovery pathways based on participant recovery plans and actions. This paper discusses some of the critical attributes and features to include in the design of a future app to support and close current gaps in needed recovery capital to help those who are dealing with many different forms of addiction recovery.

APA, Harvard, Vancouver, ISO, and other styles

28

Malini, A., P. Priyadharshini, and S. Sabeena. "An automatic assessment of road condition from aerial imagery using modified VGG architecture in faster-RCNN framework." Journal of Intelligent & Fuzzy Systems 40, no. 6 (2021): 11411–22. http://dx.doi.org/10.3233/jifs-202596.

Full text

Abstract:

To develop a surveillance and detection system for automating the process of road maintenance work which is being carried out by surveying and inspection of roads manually in the current situation. The need of the system lies in the fact that traditional methods are time-consuming, tiresome and require huge workforce. This paper proposes an automation system using Unmanned Aerial Vehicle which monitors and detects the pavement defects like cracks and potholes by processing real-time video footage of Indian highways. The collected data is processed and stored as images in a road defects database which serves as input for the system. The behavior of Region Proposal Network (RPN) is made smooth by varying the number of region proposals utilized in the model. A regularization technique called dropout is used to achieve higher performance in the proposed Faster Region based Convolutional Neural Networks object detection model. The detections are made with 62.3% mean Average Precision @ Intersection over Union (IoU)> = 0.5 for the generation of 300 region proposals which is a good score for object detections. The comparisons between proposed and existing systems shows that the proposed Faster RCNN with modified VGG-16 performs well than the existing variants.

APA, Harvard, Vancouver, ISO, and other styles

29

Li, Jianxiang, Yan Tian, Yiping Xu, and Zili Zhang. "Oriented Object Detection in Remote Sensing Images with Anchor-Free Oriented Region Proposal Network." Remote Sensing 14, no. 5 (2022): 1246. http://dx.doi.org/10.3390/rs14051246.

Full text

Abstract:

Oriented object detection is a fundamental and challenging task in remote sensing image analysis that has recently drawn much attention. Currently, mainstream oriented object detectors are based on densely placed predefined anchors. However, the high number of anchors aggravates the positive and negative sample imbalance problem, which may lead to duplicate detections or missed detections. To address the problem, this paper proposes a novel anchor-free two-stage oriented object detector. We propose the Anchor-Free Oriented Region Proposal Network (AFO-RPN) to generate high-quality oriented proposals without enormous predefined anchors. To deal with rotation problems, we also propose a new representation of an oriented box based on a polar coordinate system. To solve the severe appearance ambiguity problems faced by anchor-free methods, we use a Criss-Cross Attention Feature Pyramid Network (CCA-FPN) to exploit the contextual information of each pixel and its neighbors in order to enhance the feature representation. Extensive experiments on three public remote sensing benchmarks—DOTA, DIOR-R, and HRSC2016—demonstrate that our method can achieve very promising detection performance, with a mean average precision (mAP) of 80.68%, 67.15%, and 90.45%, respectively, on the benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

30

Xie, Lele, Yuliang Liu, Lianwen Jin, and Zecheng Xie. "DeRPN: Taking a Further Step toward More General Object Detection." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 9046–53. http://dx.doi.org/10.1609/aaai.v33i01.33019046.

Full text

Abstract:

Most current detection methods have adopted anchor boxes as regression references. However, the detection performance is sensitive to the setting of the anchor boxes. A proper setting of anchor boxes may vary significantly across different datasets, which severely limits the universality of the detectors. To improve the adaptivity of the detectors, in this paper, we present a novel dimension-decomposition region proposal network (DeRPN) that can perfectly displace the traditional Region Proposal Network (RPN). DeRPN utilizes an anchor string mechanism to independently match object widths and heights, which is conducive to treating variant object shapes. In addition, a novel scale-sensitive loss is designed to address the imbalanced loss computations of different scaled objects, which can avoid the small objects being overwhelmed by larger ones. Comprehensive experiments conducted on both general object detection datasets (Pascal VOC 2007, 2012 and MS COCO) and scene text detection datasets (ICDAR 2013 and COCO-Text) all prove that our DeRPN can significantly outperform RPN. It is worth mentioning that the proposed DeRPN can be employed directly on different models, tasks, and datasets without any modifications of hyperparameters or specialized optimization, which further demonstrates its adaptivity. The code has been released at https://github.com/HCIILAB/DeRPN.

APA, Harvard, Vancouver, ISO, and other styles

31

Wei, Xiaoyan, Yirong Wu, Fangmin Dong, Jun Zhang, and Shuifa Sun. "Developing an Image Manipulation Detection Algorithm Based on Edge Detection and Faster R-CNN." Symmetry 11, no. 10 (2019): 1223. http://dx.doi.org/10.3390/sym11101223.

Full text

Abstract:

Due to the wide availability of the tools used to produce manipulated images, a large number of digital images have been tampered with in various media, such as newspapers and social networks, which makes the detection of tampered images particularly important. Therefore, an image manipulation detection algorithm leveraged by the Faster Region-based Convolutional Neural Network (Faster R-CNN) model combined with edge detection was proposed in this paper. In our algorithm, first, original tampered images and their detected edges were sent into symmetrical ResNet101 networks to extract tampering features. Then, these features were put into the Region of Interest (RoI) pooling layer. Instead of the RoI max pooling approach, the bilinear interpolation method was adopted to obtain the RoI region. After the RoI features of original input images and edge feature images were sent into bilinear pooling layer for feature fusion, tampering classification was performed in fully connection layer. Finally, Region Proposal Network (RPN) was used to locate forgery regions. Experimental results on three different image manipulation datasets show that our proposed algorithm can detect tampered images more effectively than other existing image manipulation detection algorithms.

APA, Harvard, Vancouver, ISO, and other styles

32

Wang, Shaohuang. "Real-Time Object Detection Using a Lightweight Two-Stage Detection Network with Efficient Data Representation." IECE Transactions on Emerging Topics in Artificial Intelligence 1, no. 1 (2024): 17–30. http://dx.doi.org/10.62762/tetai.2024.320179.

Full text

Abstract:

In this paper, we introduce a novel fast object detection framework, designed to meet the needs of real-time applications such as autonomous driving and robot navigation. Traditional processing methods often trade-off between accuracy and processing speed. To address this issue, we propose a hybrid data representation method that combines the computational efficiency of voxelization with the detail capture capability of direct data processing to optimize overall performance. Our detection framework comprises two main components: a Rapid Region Proposal Network (RPN) and a Refinement Detection Network (RefinerNet). The RPN is used to generate high-quality candidate regions, while the RefinerNet performs detailed analysis on these regions to improve detection accuracy. Additionally, we have implemented a variety of network optimization techniques, including lightweight network layers, network pruning, and model quantization, to increase processing speed and reduce computational resource consumption. Extensive testing on the KITTI and the NEXET datasets has proven the effectiveness of our method in enhancing the accuracy of object detection and real-time processing speed. The experimental results show that, compared to existing technologies, our method performs exceptionally well across multiple evaluation metrics, especially in meeting the stringent requirements of real-time applications in terms of processing speed.

APA, Harvard, Vancouver, ISO, and other styles

33

Yun, Lu, Xinxin Zhang, Yuchao Zheng, Dahan Wang, and Lizhong Hua. "Enhance the Accuracy of Landslide Detection in UAV Images Using an Improved Mask R-CNN Model: A Case Study of Sanming, China." Sensors 23, no. 9 (2023): 4287. http://dx.doi.org/10.3390/s23094287.

Full text

Abstract:

Extracting high-accuracy landslide areas using deep learning methods from high spatial resolution remote sensing images is a hot topic in current research. However, the existing deep learning algorithms are affected by background noise and landslide scale effects during the extraction process, leading to poor feature extraction effects. To address this issue, this paper proposes an improved mask regions-based convolutional neural network (Mask R-CNN) model to identify the landslide distribution in unmanned aerial vehicles (UAV) images. The improvement of the model mainly includes three aspects: (1) an attention mechanism of the convolutional block attention module (CBAM) is added to the backbone residual neural network (ResNet). (2) A bottom-up channel is added to the feature pyramidal network (FPN) module. (3) The region proposal network (RPN) is replaced by guided anchoring (GA-RPN). Sanming City, China was selected as the study area for the experiments. The experimental results show that the improved model has a recall of 91.4% and an accuracy of 92.6%, which is 12.9% and 10.9% higher than the original Mask R-CNN model, respectively, indicating that the improved model is more effective in landslide extraction.

APA, Harvard, Vancouver, ISO, and other styles

34

Wu, Ruihai, Kehan Xu, Chenchen Liu, Nan Zhuang, and Yadong Mu. "Localize, Assemble, and Predicate: Contextual Object Proposal Embedding for Visual Relation Detection." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 12297–304. http://dx.doi.org/10.1609/aaai.v34i07.6913.

Full text

Abstract:

Visual relation detection (VRD) aims to describe all interacting objects in an image using subject-predicate-object triplets. Critically, valid relations combinatorially grow in O(C2 R) for C object categories and R relationships. The frequencies of relation triplets exhibit a long-tailed distribution, which inevitably leads to bias towards popular visual relations in the learned VRD model. To address this problem, we propose localize-assemble-predicate network (LAP-Net), which decomposes VRD into three sub-tasks: localizing individual objects, assembling and predicting the subject-object pairs. In the first stage of LAP-Net, Region Proposal Network (RPN) is used to generate a few class-agnostic object proposals. Next, these proposals are assembled to form subject-object pairs via a second Pair Proposal Network (PPN), in which we propose a novel contextual embedding scheme. The inner product between embedded representations faithfully reflects the compatibility between a pair of proposals, without estimating object and subject class. Top-ranked pairs from stage two are fed into a third sub-network, which precisely estimates the relationship. The whole pipeline except for the last stage is object-category-agnostic in localizing relationships in an image, alleviating the bias in popular relations induced by training data. Our LAP-Net can be trained in an end-to-end fashion. We demonstrate that LAP-Net achieves state-of-the-art performance on the VRD benchmark while maintaining high speed in inference.

APA, Harvard, Vancouver, ISO, and other styles

35

Seong, Ju-Hyeon, Soo-Hwan Lee, Won-Yeol Kim, and Dong-Hoan Seo. "High-Precision RTT-Based Indoor Positioning System Using RCDN and RPN." Sensors 21, no. 11 (2021): 3701. http://dx.doi.org/10.3390/s21113701.

Full text

Abstract:

Wi-Fi round-trip timing (RTT) was applied to indoor positioning systems based on distance estimation. RTT has a higher reception instability than the received signal strength indicator (RSSI)-based fingerprint in non-line-of-sight (NLOS) environments with many obstacles, resulting in large positioning errors due to multipath fading. To solve these problems, in this paper, we propose high-precision RTT-based indoor positioning system using an RTT compensation distance network (RCDN) and a region proposal network (RPN). The proposed method consists of a CNN-based RCDN for improving the prediction accuracy and learning rate of the received distances and a recurrent neural network-based RPN for real-time positioning, implemented in an end-to-end manner. The proposed RCDN collects and corrects a stable and reliable distance prediction value from each RTT transmitter by applying a scanning step to increase the reception rate of the TOF-based RTT with unstable reception. In addition, the user location is derived using the fingerprint-based location determination method through the RPN in which division processing is applied to the distances of the RTT corrected in the RCDN using the characteristics of the fast-sampling period.

APA, Harvard, Vancouver, ISO, and other styles

36

Chen, Lifu, Ting Weng, Jin Xing, et al. "A New Deep Learning Network for Automatic Bridge Detection from SAR Images Based on Balanced and Attention Mechanism." Remote Sensing 12, no. 3 (2020): 441. http://dx.doi.org/10.3390/rs12030441.

Full text

Abstract:

Bridge detection from Synthetic Aperture Radar (SAR) images has very important strategic significance and practical value, but there are still many challenges in end-to-end bridge detection. In this paper, a new deep learning-based network is proposed to identify bridges from SAR images, namely, multi-resolution attention and balance network (MABN). It mainly includes three parts, the attention and balanced feature pyramid (ABFP) network, the region proposal network (RPN), and the classification and regression. First, the ABFP network extracts various features from SAR images, which integrates the ResNeXt backbone network, balanced feature pyramid, and the attention mechanism. Second, extracted features are used by RPN to generate candidate boxes of different resolutions and fused. Furthermore, the candidate boxes are combined with the features extracted by the ABFP network through the region of interest (ROI) pooling strategy. Finally, the detection results of the bridges are produced by the classification and regression module. In addition, intersection over union (IOU) balanced sampling and balanced L1 loss functions are introduced for optimal training of the classification and regression network. In the experiment, TerraSAR data with 3-m resolution and Gaofen-3 data with 1-m resolution are used, and the results are compared with faster R-CNN and SSD. The proposed network has achieved the highest detection precision (P) and average precision (AP) among the three networks, as 0.877 and 0.896, respectively, with the recall rate (RR) as 0.917. Compared with the other two networks, the false alarm targets and missed targets of the proposed network in this paper are greatly reduced, so the precision is greatly improved.

APA, Harvard, Vancouver, ISO, and other styles

37

Budiarsa, Rahmat, Retantyo Wardoyo, and Aina Musdholifah. "Face recognition with occluded face using improve intersection over union of region proposal network on Mask region convolutional neural network." International Journal of Electrical and Computer Engineering (IJECE) 14, no. 3 (2024): 3256. http://dx.doi.org/10.11591/ijece.v14i3.pp3256-3265.

Full text

Abstract:

Face recognition entails detecting and identifying facial attributes. Mask region convolutional neural network (R-CNN) method is a prominent approach, while prior research predominantly delved into refining loss functions and perfecting object and face detection, recognizing, and identifying faces using imperfect data remained relatively unexplored. This study focuses on an occluded dataset comprising Indonesian faces, wherein 'occluded' denotes facial data that lacks complete visibility-encompassing instances where objects obscure faces or are partially cropped. This investigation involves a deliberate experiment that tailors the intersection over union (IoU) of the region proposal network (RPN) to suit the nuances of occluded Indonesian faces, thereby augmenting accuracy in recognition and segmentation tasks. The innovation IoU in the strategic utilization of Anchors, which involves the exclusion of anchors falling beyond the image borders to optimize computational efficiency. The outcomes of this research are striking; it showcases a remarkable 14.75%, 10.9%, and 12.97% surge based on mean average precision (mAP), mean average recall (mAR), and F1-Scores compared to the conventional Mask R-CNN approach. Notably, our proposed model elevates the average accuracy by 10% to 15% and decreases running time by 21%, a noteworthy enhancement compared to the preceding model. This progress is substantiated by validation utilizing 300 instances dataset, reinforcing the robustness of our approach.

APA, Harvard, Vancouver, ISO, and other styles

38

Liu, Chuanbin, Hongtao Xie, Zheng-Jun Zha, Lingfeng Ma, Lingyun Yu, and Yongdong Zhang. "Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (2020): 11555–62. http://dx.doi.org/10.1609/aaai.v34i07.6822.

Full text

Abstract:

Delicate attention of the discriminative regions plays a critical role in Fine-Grained Visual Categorization (FGVC). Unfortunately, most of the existing attention models perform poorly in FGVC, due to the pivotal limitations in discriminative regions proposing and region-based feature learning. 1) The discriminative regions are predominantly located based on the filter responses over the images, which can not be directly optimized with a performance metric. 2) Existing methods train the region-based feature extractor as a one-hot classification task individually, while neglecting the knowledge from the entire object. To address the above issues, in this paper, we propose a novel “Filtration and Distillation Learning” (FDL) model to enhance the region attention of discriminate parts for FGVC. Firstly, a Filtration Learning (FL) method is put forward for discriminative part regions proposing based on the matchability between proposing and predicting. Specifically, we utilize the proposing-predicting matchability as the performance metric of Region Proposal Network (RPN), thus enable a direct optimization of RPN to filtrate most discriminative regions. Go in detail, the object-based feature learning and region-based feature learning are formulated as “teacher” and “student”, which can furnish better supervision for region-based feature learning. Accordingly, our FDL can enhance the region attention effectively, and the overall framework can be trained end-to-end without neither object nor parts annotations. Extensive experiments verify that FDL yields state-of-the-art performance under the same backbone with the most competitive approaches on several FGVC tasks.

APA, Harvard, Vancouver, ISO, and other styles

39

Qu, Hao, Lilian Zhang, Xuesong Wu, Xiaofeng He, Xiaoping Hu, and Xudong Wen. "Multiscale Object Detection in Infrared Streetscape Images Based on Deep Learning and Instance Level Data Augmentation." Applied Sciences 9, no. 3 (2019): 565. http://dx.doi.org/10.3390/app9030565.

Full text

Abstract:

The development of object detection in infrared images has attracted more attention in recent years. However, there are few studies on multi-scale object detection in infrared street scene images. Additionally, the lack of high-quality infrared datasets hinders research into such algorithms. In order to solve these issues, we firstly make a series of modifications based on Faster Region-Convolutional Neural Network (R-CNN). In this paper, a double-layer region proposal network (RPN) is proposed to predict proposals of different scales on both fine and coarse feature maps. Secondly, a multi-scale pooling module is introduced into the backbone of the network to explore the response of objects on different scales. Furthermore, the inception4 module and the position sensitive region of interest (ROI) align (PSalign) pooling layer are utilized to explore richer features of the objects. Thirdly, this paper proposes instance level data augmentation, which takes into account the imbalance between categories while enlarging dataset. In the training stage, the online hard example mining method is utilized to further improve the robustness of the algorithm in complex environments. The experimental results show that, compared with baseline, our detection method has state-of-the-art performance.

APA, Harvard, Vancouver, ISO, and other styles

40

Monicaa, J. "AI-Powered Waste Detection and Localization Using Region Proposal Network and CNN." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 04 (2025): 1–9. https://doi.org/10.55041/ijsrem45254.

Full text

Abstract:

To reduce environmental pollution caused by improper waste disposal, efficient waste management solutions are essential. Traditional methods rely on manual monitoring, which is time-consuming and ineffective. Advancements in artificial intelligence and deep learning enable automated waste detection and localization, improving efficiency in waste management. Faster R-CNN, an advanced object detection model, enhances precision in recognizing and localizing waste in visual data. By utilizing deep learning techniques, waste detection achieves high accuracy, reducing dependence on manual inspections.The integration of Faster R-CNN with image processing techniques allows for real-time identification of waste objects in complex environments. Localization is achieved by drawing bounding boxes around detected waste, providing precise positional information. The Region Proposal Network (RPN) plays a key role in this process by generating candidate object regions through anchor boxes and refining them based on object scores. Tensor Flow’s Object Detection API facilitates model training and optimization, ensuring accurate recognition of waste materials. High-performance convolutional neural networks enhance feature extraction, distinguishing waste from non-waste objects effectively. The use of deep learning in image-based waste detection supports scalable waste management solutions.Automated waste detection and localization contribute to sustainable urban development by enabling proactive waste management strategies. The ability to accurately identify and mark waste locations enhances the efficiency of cleaning processes and reduces environmental hazards. By analyzing waste distribution patterns, authorities can optimize resource allocation and improve waste disposal planning. The adoption of AI-driven approaches in waste detection minimizes human effort while promoting a cleaner and healthier environment. Key Words: AI-Driven waste detection, Faster R-CNN, Region Proposal Network, TensorFlow object detection API, Image processing and Localization.

APA, Harvard, Vancouver, ISO, and other styles

41

Budiarsa, Rahmat, Retantyo Wardoyo, and Aina Musdholifah. "Face recognition with occluded face using improve intersection over union of region proposal network on Mask region convolutional neural network." Face recognition with occluded face using improve intersection over union of region proposal network on Mask region convolutional neural network 14, no. 3 (2024): 3256–65. https://doi.org/10.11591/ijece.v14i3.pp3256-3265.

Full text

Abstract:

Face recognition entails detecting and identifying facial attributes. Mask region convolutional neural network (R-CNN) method is a prominent approach, while prior research predominantly delved into refining loss functions and perfecting object and face detection, recognizing, and identifying faces using imperfect data remained relatively unexplored. This study focuses on an occluded dataset comprising Indonesian faces, wherein 'occluded' denotes facial data that lacks complete visibility-encompassing instances where objects obscure faces or are partially cropped. This investigation involves a deliberate experiment that tailors the intersection over union (IoU) of the region proposal network (RPN) to suit the nuances of occluded Indonesian faces, thereby augmenting accuracy in recognition and segmentation tasks. The innovation IoU in the strategic utilization of Anchors, which involves the exclusion of anchors falling beyond the image borders to optimize computational efficiency. The outcomes of this research are striking; it showcases a remarkable 14.75%, 10.9%, and 12.97% surge based on mean average precision (mAP), mean average recall (mAR), and F1-Scores compared to the conventional Mask R-CNN approach. Notably, our proposed model elevates the average accuracy by 10% to 15% and decreases running time by 21%, a noteworthy enhancement compared to the preceding model. This progress is substantiated by validation utilizing 300 instances dataset, reinforcing the robustness of our approach.

APA, Harvard, Vancouver, ISO, and other styles

42

Aditya P, Anky, Suryo Adhi Wibowo, and Rissa Rahmania. "Investigasi pengaruh Step Training pada Skema Same-Padding untuk Metode Faster R-CNN dalam Teknologi Augmented Reality." Jurnal Ilmiah FIFO 12, no. 2 (2021): 128. http://dx.doi.org/10.22441//fifo.2020.v12i2.002.

Full text

Abstract:

Abstract Augmented Reality (AR) is a technology with the concept of combining real-world dimensions with virtual world dimensions that are displayed in realtime. In the AR environment, interaction techniques used can vary. Marker-based AR is one type of AR that allows virtual objects to be displayed in the real world by using markers as pointers. In the use of marker-based AR required object detection method used for tracking markers. In this study, a system that can detect objects in the form of fingertips will be designed. In designing the system the Faster Region-based Convolutional Neural Network (Faster R-CNN) method is used. R-CNN Faster is an object detection method which is a combination of the Fast R-CNN method and the Region Proposal Network (RPN). The results of the detection parameters will be used for tracking, namely the coordinates x, y, width, and length. This research uses the Faster R-CNN method because it has a faster computing speed compared to the previous method, namely Particle Filter. The Faster R-CNN method uses ResNet architecture as the core of CNN. The system configuration to be tested is the 25K, 50K and 75K step training with the same-padding scheme. The testing process is taken from a video consisting of 10800 training data and 3600 test data. The best system configuration based on parameter priority for AR technology is obtained in the 50K step training.Keyword: augmented reality, convolutional neural network, faster region-based convolutional neural network, region proposal network, ResNet.Abstrak Augmented Reality (AR) adalah teknologi dengan konsep menggabungkan dimensi dunia nyata dengan dimensi dunia virtual yang ditampilkan secara real-time. Dalam lingkungan AR, teknik interaksi yang digunakan dapat bermacam – macam. Marker-based AR merupakan salah satu jenis AR yang memungkinkan objek virtual ditampilkan ke dalam dunia nyata dengan digunakannya marker sebagai pointer-nya. Dalam penggunaan AR berbasis marker diperlukan metode deteksi objek yang digunakan untuk tracking marker. Dalam penelitian ini akan dirancang sebuah sistem yang dapat mendeteksi objek berupa ujung jari. Dalam perancangan sistem tersebut digunakan metode Faster Region-Based Convolutional Nueral Network (Faster R-CNN). Faster R-CNN merupakan salah satu metode deteksi objek yang merupakan gabungan dari metode Fast R-CNN dan Region Proposal Network (RPN). Hasil dari parameter deteksi akan digunakan untuk tracking, yaitu koordinat x, y, width, dan length. Penelitian ini menggunakan metode Faster R-CNN karena memiliki kecepatan komputasi yang lebih cepat dibandingkan dengan metode sebelumnya yaitu Particle Filter. Metode Faster R-CNN mengunakan arsitektur ResNet sebagai inti dari CNN. Konfigurasi sistem yang akan diuji adalah step training 25K, 50K dan 75K dengan skema same-padding. Proses pengujian diambil dari video yang terdiri dari 10800 data latih dan 3600 data uji. Konfigurasi sistem terbaik berdasarkan prioritas parameter untuk teknologi AR didapatkan pada step training 50K.Keyword: augmented reality, convolutional neural network, faster region-based convolutional neural network, region proposal network, ResNet.

APA, Harvard, Vancouver, ISO, and other styles

43

Ma, Shiyuan, Donglin Qian, Kai Ye, and Shengchuan Zhang. "CAKE: Category Aware Knowledge Extraction for Open-Vocabulary Object Detection." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 6 (2025): 5982–90. https://doi.org/10.1609/aaai.v39i6.32639.

Full text

Abstract:

Open vocabulary object detection (OVOD) task aims to detect objects of novel categories beyond the base categories in the training set. To this end, the detector needs to access image-text pairs containing rich semantic information or the visual language pre-trained model (VLM) learned on them. Recent OVOD methods rely on knowledge distillation from VLMs. However, there are two main problems in current methods: (1) Current knowledge distillation frameworks fail to take advantage of the global category information of VLMs and thus fail to learn category-specific knowledge. (2) Due to the overfitting phenomenon of base categories during training, current OVOD networks generally have the problem of suppressing novel categories as background. To address these two problems, we propose a Category Aware Knowledge Extraction framework (CAKE), which consists of a Category-Specific Knowledge Distillation branch (CSKD) and a Category Generalization Region Proposal Network (CG-RPN). CSKD can more fully extract category-strong related information through category-specific distillation, and it is also conducive to filtering the exclusion problem between individuals of the same category; in this process, the model constructs a category-specific feature set to maintain high-quality category features. CG-RPN leverages the guidance of feature set to adjust the confidence scores of region proposals, thereby mining proposals that potentially contain novel categories of objects. Extensive experiments show that our method can plug and play well with many existing methods and significantly improve their detection performance. Moreover, our CAKE framework can reach the-state-of-the-art performance on OV-COCO and OV-LVIS datasets.

APA, Harvard, Vancouver, ISO, and other styles

44

Ismail, Azlan, Taufik Rahmat, and Sharifah Aliman. "CHEST X-RAY IMAGE CLASSIFICATION USING FASTER R-CNN." MALAYSIAN JOURNAL OF COMPUTING 4, no. 1 (2019): 225. http://dx.doi.org/10.24191/mjoc.v4i1.6095.

Full text

Abstract:

Chest x-ray image analysis is the common medical imaging exam needed to assess different pathologies. Having an automated solution for the analysis can contribute to minimizing the workloads, improve efficiency and reduce the potential of reading errors. Many methods have been proposed to address chest x-ray image classification and detection. However, the application of regional-based convolutional neural networks (CNN) is currently limited. Thus, we propose an approach to classify chest x-ray images into either one of two categories, pathological or normal based on Faster Regional-CNN model. This model utilizes Region Proposal Network (RPN) to generate region proposals and perform image classification. By applying this model, we can potentially achieve two key goals, high confidence in the classification and reducing the computation time. The results show the applied model achieved higher accuracy as compared to the medical representatives on the random chest x-ray images. The classification model is also reasonably effective in classifying between finding and normal chest x-ray image captured through a live webcam.

APA, Harvard, Vancouver, ISO, and other styles

45

Pan, Qiyu, Keyi Fu, and Gaocai Wang. "Study on Few-Shot Object Detection Approach Based on Improved RPN and Feature Aggregation." Applied Sciences 15, no. 7 (2025): 3734. https://doi.org/10.3390/app15073734.

Full text

Abstract:

In this paper, we propose an improved Region Proposal Network (RPN) by introducing a metric-based nonlinear classifier to compute the similarity between features extracted from the backbone network and those of new classes. This enhancement aims to improve the detection precision for candidate boxes of new classes and filter out candidate boxes with high Intersection of Union (IoU). Simultaneously, we introduce an attention-based Feature Aggregation Module (AFM) in Region of Interest (RoI) Align to aggregate feature information from different levels, obtaining more comprehensive information and feature representation to address the issue of missing feature information due to scale differences. Combining these two improvements, we present a novel few-shot object detection algorithm—IFA-FSOD. We conduct extensive experiments on datasets. Compared to some mainstream few-shot object detection algorithms, the IFA-FSOD algorithm can select more accurate candidate boxes, addressing issues of missed high IoU candidate boxes and incomplete feature information capture, resulting in higher precision.

APA, Harvard, Vancouver, ISO, and other styles

46

Lu, Manhuai, and Liqin Chen. "Efficient Object Detection Algorithm in Kitchen Appliance Scene Images Based on Deep Learning." Mathematical Problems in Engineering 2020 (December 15, 2020): 1–12. http://dx.doi.org/10.1155/2020/6641491.

Full text

Abstract:

The accuracy of object detection based on kitchen appliance scene images can suffer severely from external disturbances such as various levels of specular reflection, uneven lighting, and spurious lighting, as well as internal scene-related disturbances such as invalid edges and pattern information unrelated to the object of interest. The present study addresses these unique challenges by proposing an object detection method based on improved faster R-CNN algorithm. The improved method can identify object regions scattered in various areas of complex appliance scenes quickly and automatically. In this paper, we put forward a feature enhancement framework, named deeper region proposal network (D-RPN). In D-RPN, a feature enhancement module is designed to more effectively extract feature information of an object on kitchen appliance scene. Then, we reconstruct a U-shaped network structure using a series of feature enhancement modules. We have evaluated the proposed D-RPN on the dataset we created. It includes all kinds of kitchen appliance control panels captured in nature scene by image collector. In our experiments, the best-performing object detection method obtained a mean average precision mAP value of 89.84% in the testing dataset. The test results show that the proposed improved algorithm achieves higher detecting accuracy than state-of-the-art object detection methods. Finally, our proposed detection method can further be used in text recognition.

APA, Harvard, Vancouver, ISO, and other styles

47

Sun, Haoze, Tianqing Chang, Lei Zhang, Guozhen Yang, Bin Han, and Junwei Chen. "Armored Target Detection in Battlefield Environment Based on Top-Down Aggregation Network and Hierarchical Scale Optimization." International Journal of Pattern Recognition and Artificial Intelligence 33, no. 04 (2019): 1950007. http://dx.doi.org/10.1142/s0218001419500071.

Full text

Abstract:

Armored equipment plays a crucial role in the ground battlefield. The fast and accurate detection of enemy armored targets is significant to take the initiative in the battlefield. Comparing to general object detection and vehicle detection, armored target detection in battlefield environment is more challenging due to the long distance of observation and the complicated environment. In this paper, an accurate and robust automatic detection method is proposed to detect armored targets in battlefield environment. Firstly, inspired by Feature Pyramid Network (FPN), we propose a top-down aggregation (TDA) network which enhances shallow feature maps by aggregating semantic information from deeper layers. Then, using the proposed TDA network in a basic Faster R-CNN framework, we explore the further optimization of the approach for armored target detection: for the Region of Interest (RoI) Proposal Network (RPN), we propose a multi-branch RPNs framework to generate proposals that match the scale of armored targets and the specific receptive field of each aggregated layer and design hierarchical loss for the multi-branch RPNs; for RoI Classifier Network (RCN), we apply RoI pooling on the single finest scale feature map and construct a light and fast detection network. To evaluate our method, comparable experiments with state-of-art detection methods were conducted on a challenging dataset of images with armored targets. The experimental results demonstrate the effectiveness of the proposed method in terms of detection accuracy and recall rate.

APA, Harvard, Vancouver, ISO, and other styles

48

林亮宇, 林亮宇, та 林朝興林朝興. "利用多尺度感興趣區域之細微關係提供圖片字幕". 理工研究國際期刊 13, № 2 (2023): 019–38. http://dx.doi.org/10.53106/222344892023101302003.

Full text

Abstract:

<p>隨著機器學習的蓬勃發展，圖片字幕生成(Image Captioning)的技術愈來愈進步。近期的Image Captioning引入區域提取網路(Region proposal Networks，RPN)與注意力機制(Attention Mechanism)。Image Captioning 透過 RPN 提取圖片中特定的物件區域，可以降低雜訊被當作視覺特徵的機率；注意力機制讓模型更專注在物件到文字的轉換。但是目前研究成果還存在著缺陷，RPN 與注意力機制皆專注於單一物件區域。它們缺少物件與物件之間更細膩的視覺特徵。上述的缺陷導致字幕生成器生成不明確的關係描述。為了提高Image Captioning 生成關係描述的細膩度，本研究提出透過不同物件之間多尺度感興趣區域之關係特徵的Image Captioning模型。本研究架構有 RPN、全卷積神經網路(Fully Convolutional Neural Networks，FCNN)以及長短期記憶(Long Short-term Memory，LSTM)單元。相較於現有的研究成果，在視覺特徵上，除了物件區域外，我們將進一步提取不同物件之間的多尺度 ROIs。由於某些多尺度 ROIs 是屬於雜訊，因此利用並交比(Intersection-over-Union)進行篩選。每一個ROI都先經由FCNN萃取出視覺特徵，再通過融合機制與排序網路獲得已排序的融合特徵，最後利用 LSTM 學習此特徵到完整句子的轉換。在訓練過程中額外透過階層式屬性的輔助監督，使字幕生成器能夠針對如何生成細膩的屬性進行學習。本研究提出的架構能夠在動態的圖片上，使用更精確的動詞描述物件動作。並且在基於 n-gram 的方法上，獲得更高的分數。</p> <p> </p><p>With the rapid development of machine learning, the technique of Image Captioning is be coming more and more advanced. Recent researches of Image Captioning introduce Region Proposal Networks (RPN) and Attention Mechanism. Through RPN, we can extract features of specific object region in the image and reduce the probability of noises being treated as visual features. Attention mechanism makes the models to focus more on the mapping of object and caption. However, the current research results have deficiencies. Both RPN and Attention Mechanism only focus on the single object region instead of fine-grained visual features. Aforementioned deficiencies cause mistakes that caption generator generates uncertain rela tionships. In this paper, to improve exquisiteness of relationship descriptions for Image Cap tioning, we propose the Image Captioning model which generates sentence with multi-scale regions of interest (ROIs) between two different objects. Our proposed architecture includes Region Proposal Networks, Fully Convolutional Neural Networks and Long Short-term Memory cells. Compared to the existing research results, we extract not only object regions but multi-scale ROIs between two different objects on visual features. Some of Multi-scale ROIs are noises that can be screened by utilizing Intersection-over-Union (IoU). Each ROI utilizes FCNN to extract the visual features, followed by obtaining sorted fusion features with fusion mechanism and sorting network, and lastly learning transformation between this features to a whole sentence by LSTM. Caption generator can focus on learning how to generate fine grained attributes with hierarchical attribute supervisions on the training stage. The architecture proposed in this study can use more precise verbs to describe object actions on dynamic pic tures. Furthermore, our architecture outperforms on metrics based n-gram.</p> <p>&nbsp;</p>

APA, Harvard, Vancouver, ISO, and other styles

49

Xiao, Yi, Xinqing Wang, Peng Zhang, Fanjie Meng, and Faming Shao. "Object Detection Based on Faster R-CNN Algorithm with Skip Pooling and Fusion of Contextual Information." Sensors 20, no. 19 (2020): 5490. http://dx.doi.org/10.3390/s20195490.

Full text

Abstract:

Deep learning is currently the mainstream method of object detection. Faster region-based convolutional neural network (Faster R-CNN) has a pivotal position in deep learning. It has impressive detection effects in ordinary scenes. However, under special conditions, there can still be unsatisfactory detection performance, such as the object having problems like occlusion, deformation, or small size. This paper proposes a novel and improved algorithm based on the Faster R-CNN framework combined with the Faster R-CNN algorithm with skip pooling and fusion of contextual information. This algorithm can improve the detection performance under special conditions on the basis of Faster R-CNN. The improvement mainly has three parts: The first part adds a context information feature extraction model after the conv5_3 of the convolutional layer; the second part adds skip pooling so that the former can fully obtain the contextual information of the object, especially for situations where the object is occluded and deformed; and the third part replaces the region proposal network (RPN) with a more efficient guided anchor RPN (GA-RPN), which can maintain the recall rate while improving the detection performance. The latter can obtain more detailed information from different feature layers of the deep neural network algorithm, and is especially aimed at scenes with small objects. Compared with Faster R-CNN, you only look once series (such as: YOLOv3), single shot detector (such as: SSD512), and other object detection algorithms, the algorithm proposed in this paper has an average improvement of 6.857% on the mean average precision (mAP) evaluation index while maintaining a certain recall rate. This strongly proves that the proposed method has higher detection rate and detection efficiency in this case.

APA, Harvard, Vancouver, ISO, and other styles

50

Yao, Canming, Pengfei Xie, Lei Zhang, and Yuyuan Fang. "ATSD: Anchor-Free Two-Stage Ship Detection Based on Feature Enhancement in SAR Images." Remote Sensing 14, no. 23 (2022): 6058. http://dx.doi.org/10.3390/rs14236058.

Full text

Abstract:

Syntheticap erture radar (SAR) ship detection in harbors is challenging due to the similar backscattering of ship targets to surrounding background interference. Prevalent two-stage ship detectors usually use an anchor-based region proposal network (RPN) to search for the possible regions of interest on the whole image. However, most pre-defined anchor boxes are redundantly and randomly tiled on the image, manifested as low-quality object proposals. To address these issues, this paper proposes a novel detection method combined with two feature enhancement modules to improve ship detection capability. First, we propose a flexible anchor-free detector (AFD) to generate fewer but higher-quality proposals around the object centers in a keypoint prediction manner, which completely avoids the complicated computation in RPN, such as calculating overlapping related to anchor boxes. Second, we leverage the proposed spatial insertion attention (SIA) module to enhance the feature discrimination between ship targets and background interference. It accordingly encourages the detector to pay attention to the localization accuracy of ship targets. Third, a novel weighted cascade feature fusion (WCFF) module is proposed to adaptively aggregate multi-scale semantic features and thus help the detector boost the detection performance of multi-scale ships in complex scenes. Finally, combining the newly-designed AFD and SIA/WCFF modules, we present a new detector, named anchor-free two-stage ship detector (ATSD), for SAR ship detection under complex background interference. Extensive experiments on two public datasets, i.e., SSDD and HRSID, verify that our ATSD delivers state-of-the-art detection performance over conventional detectors.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!