Log in

Relevant bibliographies by topics / AReM dataset / Journal articles

To see the other types of publications on this topic, follow the link: AReM dataset.

Journal articles on the topic 'AReM dataset'

Author: Grafiati

Published: 4 June 2025

Last updated: 23 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'AReM dataset.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Yi-Fei, Tan, Guo Xiaoning, and Poh Soon-Chang. "Time series activity classification using gated recurrent units." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 4 (2021): 3551–58. https://doi.org/10.11591/ijece.v11i4.pp3551-3558.

Full text

Abstract:

The population of elderly is growing and is projected to outnumber the youth in the future. Many researches on elderly assisted living technology were carried out. One of the focus areas is activity monitoring of the elderly. AReM dataset is a time series activity recognition dataset for seven different types of activities, which are bending 1, bending 2, cycling, lying, sitting, standing and walking. In the original paper, the author used a many-to-many recurrent neural network for activity recognition. Here, we introduced a time series classification method where Gated Recurrent Units with many-to-one architecture were used for activity classification. The experimental results obtained showed an excellent accuracy of 97.14%.

APA, Harvard, Vancouver, ISO, and other styles

2

Tan, Yi-Fei, Xiaoning Guo, and Soon-Chang Poh. "Time series activity classification using gated recurrent units." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 4 (2021): 3551. http://dx.doi.org/10.11591/ijece.v11i4.pp3551-3558.

Full text

Abstract:

<span>The population of elderly is growing and is projected to outnumber the youth in the future. Many researches on elderly assisted living technology were carried out. One of the focus areas is activity monitoring of the elderly. AReM dataset is a time series activity recognition dataset for seven different types of activities, which are bending 1, bending 2, cycling, lying, sitting, standing and walking. In the original paper, the author used a many-to-many Recurrent Neural Network for activity recognition. Here, we introduced a time series classification method where Gated Recurrent Units with many-to-one architecture were used for activity classification. The experimental results obtained showed an excellent accuracy of 97.14%.</span>

APA, Harvard, Vancouver, ISO, and other styles

3

Delima, Rosa, and Antonius Rachmat Chrismanto. "Otomatisasi Pembentukan Class Diagram dengan Pendekatan Metode Pemrosesan Teks dan Algoritma CombineTF." Jurnal Edukasi dan Penelitian Informatika (JEPIN) 10, no. 1 (2024): 120. http://dx.doi.org/10.26418/jp.v10i1.72518.

Full text

Abstract:

Spesifikasi kebutuhan merupakan bagian penting dalam proses rekayasa kebutuhan perangkat lunak. Spesifikasi kebutuhan menjadi penghubung antara system analyst dan programmer yang akan melakukan pengembangan sistem. Proses rekayasa kebutuhan merupakan pekerjaan yang bersifat time consuming dan membutuhkan effort yang besar bagi analis sistem. Pekerjaan analis untuk melakukan rekayasa kebutuhan dapat lebih efisien atau lebih cepat dengan bantuan tool untuk mengotomatisasi proses rekayasa kebutuhan. Pada penelitian ini dilakukan pengembangan spesifikasi kebutuhan berupa class diagram secara otomatis dari data kebutuhan. Penelitian ini bermanfaat untuk membantu analis dalam melakukan spesifikasi kebutuhan. Spesifikasi kebutuhan yang dihasilkan merupakan pengembangan dari Automatic Requirments Engineering Model (AREM). Pembentukan class diagram dilakukan melalui tiga tahapan yaitu pembentukan class diagram dari data kebutuhan, penanganan duplikasi objek pada diagram, dan refinement class diagram. Pembentukan diagram pada tahap pertama dilakukan dengan menggunakan pendekatan pemrosesan teks, sementara itu penanganan duplikasi objek dilakukan menggunakan pendekatan term-frequency (TF) dan gabungan algoritma CombineTF dan Jaro-Winkler. Penelitian ini menggunakan dataset kebutuhan untuk pengembangan sistem informasi koperasi. Penelitian berhasil mengembangkan model untuk otomatisasi pembentukan class diagram. Hasil penelitian menunjukan bahwa penanganan duplikasi objek pada class diagram mampu mengatasi 62,5% duplikasi objek dengan nilai precision 0,94 dan nilai akurasi 0,97 untuk nilai threshold algoritma ≥ 0.8.

APA, Harvard, Vancouver, ISO, and other styles

4

Matosak, Bruno Menini, Getachew Workineh Gella, and Stefan Lang. "SenForFlood: A New Global Dataset for Flooded Area Detection." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-M-7-2025 (May 24, 2025): 97–102. https://doi.org/10.5194/isprs-archives-xlviii-m-7-2025-97-2025.

Full text

Abstract:

Abstract. Floods are devastating hazards that cause human displacement, loss of life and damage of properties. Getting accurate information about the extent and severity of floods is essential for planning proper humanitarian emergency assistance. Though integrating Earth observation with deep learning models supports rapid information extraction, mapping floods accurately is still a challenging task, because of the necessity of extensive, representative datasets with high quality labels to train models. While there exist some datasets that focus on providing satellite imagery for flood events, these are typically limited to data either from few floods or for specific regions. Moreover, the majority of these datasets provide images captured only during the flood event, which hinders methods that rely on detecting change. Therefore, in this work, we created a global dataset for mapping flood extent (SentForFlood), including images before and during flood from Sentinel-1 and -2, terrain elevation and slope, Land Use and Land Cover (LULC), and flood masks. The samples included in each flood event were selected by analysts considering quality of flood mask and completeness of the available satellite imagery. The dataset incorporated data from over 350 distinct flood events, encompassing all continents except Antarctica. The dataset was tested by training a convolutional neural network for detecting floods without permanent water bodies and the results are discussed. We expect that the dataset will facilitate the development of robust, transferable models for automatic flood mapping, thereby contributing to the humanitarian emergency response in crisis situations. Dataset download instructions, as well as code for easy usage is available at https://github.com/menimato/SenForFlood.

APA, Harvard, Vancouver, ISO, and other styles

5

Wang, Juan, Zhibin Zhang, and Yanjuan Li. "Constructing Phylogenetic Networks Based on the Isomorphism of Datasets." BioMed Research International 2016 (2016): 1–7. http://dx.doi.org/10.1155/2016/4236858.

Full text

Abstract:

Constructing rooted phylogenetic networks from rooted phylogenetic trees has become an important problem in molecular evolution. So far, many methods have been presented in this area, in which most efficient methods are based on the incompatible graph, such as the CASS, the LNETWORK,and the BIMLR. This paper will research the commonness of the methods based on the incompatible graph, the relationship between incompatible graph and the phylogenetic network, and the topologies of incompatible graphs. We can find out all the simplest datasets for a topologyGand construct a network for every dataset. For any one datasetC, we can compute a network from the network representing the simplest dataset which is isomorphic toC. This process will save more time for the algorithms when constructing networks.

APA, Harvard, Vancouver, ISO, and other styles

6

Núñez-Casillas, Laia, José Rafael García Lázaro, José Andrés Moreno-Ruiz, and Manuel Arbelo. "A Comparative Analysis of Burned Area Datasets in Canadian Boreal Forest in 2000." Scientific World Journal 2013 (2013): 1–13. http://dx.doi.org/10.1155/2013/289056.

Full text

Abstract:

The turn of the new millennium was accompanied by a particularly diverse group of burned area datasets from different sensors in the Canadian boreal forests, brought together in a year of low global fire activity. This paper provides an assessment of spatial and temporal accuracy, by means of a fire-by-fire comparison of the following: two burned area datasets obtained from SPOT-VEGETATION (VGT) imagery, a MODIS Collection 5 burned area dataset, and three different datasets obtained from NOAA-AVHRR. Results showed that burned area data from MODIS provided accurate dates of burn but great omission error, partially caused by calibration problems. One of the VGT-derived datasets (L3JRC) represented the largest number of fire sites in spite of its great overall underestimation, whereas the GBA2000 dataset achieved the best burned area quantification, both showing delayed and very variable fire timing. Spatial accuracy was comparable between the 5 km and the 1 km AVHRR-derived datasets but was remarkably lower in the 8 km dataset leading, us to conclude that at higher spatial resolutions, temporal accuracy was lower. The probable methodological and contextual causes of these differences were analyzed in detail.

APA, Harvard, Vancouver, ISO, and other styles

7

Schwambach, Gislene Cássia dos Santos, Michele Kremer Sott, and Rodrigo Evaldo Schwambach. "Wearable devices and workplace productivity: a bibliometric analysis of their integration into professional environments." Dataset Reports 3, no. 1 (2024): 101–6. http://dx.doi.org/10.58951/dataset.2024.018.

Full text

Abstract:

This study analyzes workers' perceptions and acceptance of the use of wearable devices in the workplace. A bibliometric review supported by complex network analysis was carried out, through which the driving themes of the area were identified. The results indicate the increase in the use of these technologies and the factors linked to employee acceptance or rejection. Workers' perceptions and the potential benefits of wearable technologies are also discussed. The findings reveal factors influencing technology acceptance and highlight organizational and technological characteristics that facilitate adoption for effective daily use. The study contributes to the literature by evaluating the feasibility and acceptance of wearable technologies within companies. It underscores that the lack of employee involvement in device selection is a significant barrier to adoption.

APA, Harvard, Vancouver, ISO, and other styles

8

Watanabe, Tatsuhisa, Tomoharu Nakashima, and Yoshifum Kusunoki. "CHANGE DETECTION FOR AREA SURVEILLANCE USING A MOVING CAMERA." Anwendungen und Konzepte der Wirtschaftsinformatik, no. 14 (December 9, 2021): 7. http://dx.doi.org/10.26034/lu.akwi.2021.3319.

Full text

Abstract:

This paper tackles area surveillance with a moving camera by change detection. None of the existing datasets for change detection meets a surveillance scenario where a camera is mounted on a moving platform and pointed in the direction of moving. Thus, this paper creates a new dataset including several challenging points. For this dataset, this paper employs a composable method and proposes some components. To evaluate the proposed components, some corresponding classic methods were also tested on the dataset. As a result, the proposals outperformed them. Moreover, this paper investigated the relationship between the parameters of the components and their performance.

APA, Harvard, Vancouver, ISO, and other styles

9

Cunha, Hanna Diniz, Andrea Diniz da Silva, Bernardo Braga Martins, et al. "Detection of slums in Rio de Janeiro through satellite images." Dataset Reports 3, no. 1 (2024): 107–13. http://dx.doi.org/10.58951/dataset.2024.019.

Full text

Abstract:

According to UN-Habitat, more than one billion people live in informal settlements worldwide, of which 200 million living in Africa and another 100 million in Latin America, mainly in countries such as Brazil, Mexico, Colombia, Peru, and Argentina. Rio de Janeiro has 1,074 favelas, representing 22% of the city's total population, making it the Brazilian municipality with the highest percentage of people living in favelas. Ensuring human rights through access to basic services for the populations living in these settlements, through programs and public policies, depends on timely and reliable data. However, despite spending decades establishing their national statistical systems, usually based on data collection directly from individuals, in most countries, the data produced in traditional ways does not portray the dynamics of these populations promptly. As an alternative, we combined free satellite imagery with machine learning and deep learning to identify the area occupied by favelas in the city of Rio de Janeiro. We compared the results of eight distinct segmentation models using the IoU and F1 as metrics. Among the evaluated methods, two stood out for their performance: GradientBoost and XGBoost.

APA, Harvard, Vancouver, ISO, and other styles

10

Hua, Lei, Shicheng Li, Deng Gao, and Wangjun Li. "Uncertainties of Global Historical Land Use Datasets in Pasture Reconstruction for the Tibetan Plateau." Remote Sensing 14, no. 15 (2022): 3777. http://dx.doi.org/10.3390/rs14153777.

Full text

Abstract:

Global historical land use datasets have been widely used in global or regional environmental change studies. Historical pasture data are essential components of these spatially explicit global datasets, and their uncertainties have not been well evaluated. Using the livestock-based historical pasture dataset for the Tibetan Plateau (TP), we evaluated the uncertainties of these representative global historical land use datasets in pasture reconstruction for the TP over the past 300 years in terms of pasture area estimation and spatial pattern mapping. We found that only the Sustainability and the Global Environment (SAGE) dataset can roughly reflect the temporal and spatial characteristics of historical pasture changes on the TP. The History Database of the Global Environment (HYDE) version 3.2 and the Pongratz Julia (PJ) datasets overestimated pasture area for the TP dramatically, with a maximum area ratio of about 221% and 291%, respectively, and the Kaplan and Krumhardt 2010 (KK10) dataset underestimated pasture area for the TP dramatically, with a minimum area ratio of only 9%. As for the spatial pattern, all these global datasets overestimated the spatial scope of grazing activities obviously. The KK10 dataset unreasonably allocated pasture to forest areas in southeastern Tibet because only climate and soil factors were considered in assessing land suitability for grazing. Using population to estimate pasture area and only using natural factors to allocate pasture area into grids is unsuitable for the TP historical pasture reconstruction. In the future, more information directly related to grazing activities, e.g., the number of livestock and its spatial distribution, and social-cultural factors, including technology and diet, should be used for area estimation and spatial pattern mapping to improve the accuracy of pasture data in these global datasets.

APA, Harvard, Vancouver, ISO, and other styles

11

Chen, Yijun, Shenxin Zhao, Lihua Zhang, and Qi Zhou. "Quality Assessment of Global Ocean Island Datasets." ISPRS International Journal of Geo-Information 12, no. 4 (2023): 168. http://dx.doi.org/10.3390/ijgi12040168.

Full text

Abstract:

Ocean Island data are essential to the conservation and management of islands and coastal ecosystems, and have also been adopted by the United Nations as a sustainable development goal (SDG 14). Currently, two categories of island datasets, i.e., global shoreline vector (GSV) and OpenStreetMap (OSM), are freely available on a global scale. However, few studies have focused on accessing and comparing the data quality of these two datasets, which is the main purpose of our study. Specifically, these two datasets were accessed using four 100 × 100 (km2) study areas, in terms of three aspects of measures, i.e., accuracy (including overall accuracy (OA), precision, recall and F1), completeness (including area completeness and count completeness) and shape complexity. The results showed that: (1) Both the two datasets perform well in terms of the OA (98% or above) and F1 (0.9 or above); the OSM dataset performs better in terms of precision, but the GSV dataset performs better in terms of recall. (2) The area completeness is almost 100%, but the count completeness is much higher than 100%, indicating the total areas of the two datasets are almost the same, but there are many more islands in the OSM dataset. (3) In most cases, the fractal dimension of the OSM dataset is relatively larger than the GSV dataset in terms of the shape complexity, indicating that the OSM dataset has more detail in terms of the island boundary or coastline. We concluded that both of the datasets (GSV and OSM) are effective for island mapping, but the OSM dataset can identify more small islands and has more detail.

APA, Harvard, Vancouver, ISO, and other styles

12

Dhadhal, Hema, and Paresh Kotak. "Leveraging datasets for effective mitigation of DDoS attacks in software-defined networking: significance and challenges." Radioelectronic and Computer Systems 2024, no. 2 (2024): 136–46. http://dx.doi.org/10.32620/reks.2024.2.11.

Full text

Abstract:

Software-Defined Networking (SDN) has emerged as a transformative paradigm for network management, offering centralized control and programmability. However, with the proliferation of Distributed Denial of Service (DDoS) attacks that pose significant threats to network infrastructures, effective mitigation strategies are needed. The subject matter of this study is to explore the importance of datasets in the mitigation of DDoS attacks in SDN environments. The paper discusses the significance of datasets for training machine learning models, evaluating detection mechanisms, and enhancing the resilience of SDN-based defense systems. Goal of the paper is to assist researchers in effectively selecting and usage of datasets for DDoS mitigation in SDN, thereby maximizing benefits and overcoming challenges involved in dataset selection. This paper outlines the challenges associated with dataset collection, labeling, and management, along with potential solutions to address these challenges. Effective detection and mitigation of DDoS attacks in SDN require robust datasets that capture the diverse and evolving nature of attack scenarios. Characterization of tasks for each section is as follows: Importance of datasets in DDoS attack mitigation in SDN, challenges in dataset utilization in DDoS mitigation in SDN, Guidelines for dataset selection, comparison of datasets used and their results and different dataset usage according to the need. Methodology involves collecting results in tabular form based on prior research to analyze the characteristics of existing datasets, techniques for dataset augmentation and enhancement, and evaluating the effectiveness of different datasets in detecting and mitigating DDoS attacks through comprehensive experimentation. Results of our findings indicate that effective detection and mitigation of DDoS attacks in SDN require robust datasets that capture the diverse and evolving nature of attack scenarios. Our findings provide valuable insights into the importance of datasets in enhancing the resilience of SDN infrastructures against DDoS attacks. In conclusion, our findings provide valuable insights into the importance of datasets in enhancing the resilience of SDN infrastructures against DDoS attacks and highlight the need for further research in this critical area. Thorough guidelines for dataset selection and impacts of different datasets used in recent studies, provide research challenges and future directions in this area.

APA, Harvard, Vancouver, ISO, and other styles

13

Ooghe, Hubert, and Sofie Balcaen. "Are Failure Prediction Models Widely Usable? An Empirical Study Using a Belgian Dataset." Multinational Finance Journal 11, no. 1/2 (2007): 33–76. http://dx.doi.org/10.17578/11-1/2-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Zago, Mattia, Stefano Longari, Andrea Tricarico, et al. "ReCAN - Dataset for reverse engineering of Controller Area Networks." Data in Brief 29 (January 22, 2020): 105149. https://doi.org/10.1016/j.dib.2020.105149.

Full text

Abstract:

This article details the methodology and the approach used to extract and decode the data obtained from the Controller Area Network (CAN) buses in two personal vehicles and three commercial trucks for a total of 36 million data frames. The dataset is composed of two complementary parts, namely the raw data and the decoded ones. Along with the description of the data, this article also reports both hardware and software requirements to first extract the data from the vehicles and secondly decode the binary data frames to obtain the actual sensors’ data. Finally, to enable analysis reproducibility and future researches, the code snippets that have been described in pseudo-code will be publicly available in a code repository. Motivated enough actors may intercept, interact, and recognize the vehicle data with consumer-grade technology, ultimately refuting, once-again, the security-through-obscurity paradigm used by the automotive manufacturer as a primary defensive countermeasure.

APA, Harvard, Vancouver, ISO, and other styles

15

Tariku, Girma, Isabella Ghiglieno, Andres Sanchez Morchio, et al. "Deep-Learning-Based Land Cover Mapping in Franciacorta Wine Growing Area." Applied Sciences 15, no. 2 (2025): 871. https://doi.org/10.3390/app15020871.

Full text

Abstract:

Land cover mapping is essential to understanding global land-use patterns and studying biodiversity composition and the functioning of eco-systems. The introduction of remote sensing technologies and artificial intelligence models made it possible to base land cover mapping on satellite imagery in order to monitor changes, assess ecosystem health, support conservation efforts, and reduce monitoring time. However, significant challenges remain in managing large, complex satellite imagery datasets, acquiring specialized datasets due to high costs and labor intensity, including a lack of comparative studies for the selection of optimal deep learning models. No less important is the scarcity of aerial datasets specifically tailored for agricultural areas. This study addresses these gaps by presenting a methodology for semantic segmentation of land covers in agricultural areas using satellite images and deep learning models with pre-trained backbones. We introduce an efficient methodology for preparing semantic segmentation datasets and contribute the “Land Cover Aerial Imagery” (LICAI) dataset for semantic segmentation. The study focuses on the Franciacorta area, Lombardy Region, leveraging the rich diversity of the dataset to effectively train and evaluate the models. We conducted a comparative study, using cutting-edge deep-learning-based segmentation models (U-Net, SegNet, DeepLabV3) with various pre-trained backbones (ResNet, Inception, DenseNet, EfficientNet) on our dataset acquired from Google Earth Pro. Through meticulous data acquisition, preprocessing, model selection, and evaluation, we demonstrate the effectiveness of these techniques in accurately identifying land cover classes. Integrating pre-trained feature extraction networks significantly improves performance across various metrics. Additionally, addressing challenges such as data availability, computational resources, and model interpretability is essential for advancing the field of remote sensing, in support of biodiversity conservation and the provision of ecosystem services and sustainable agriculture.

APA, Harvard, Vancouver, ISO, and other styles

16

Eum, Hyung-Il, and Anil Gupta. "Hybrid climate datasets from a climate data evaluation system and their impacts on hydrologic simulations for the Athabasca River basin in Canada." Hydrology and Earth System Sciences 23, no. 12 (2019): 5151–73. http://dx.doi.org/10.5194/hess-23-5151-2019.

Full text

Abstract:

Abstract. A reliable climate dataset is the backbone for modelling the essential processes of the water cycle and predicting future conditions. Although a number of gridded climate datasets are available for the North American content which provide reasonable estimates of climatic conditions in the region, there are inherent inconsistencies in these available climate datasets (e.g., spatially and temporally varying data accuracies, meteorological parameters, lengths of records, spatial coverage, temporal resolution, etc.). These inconsistencies raise questions as to which datasets are the most suitable for the study area and how to systematically combine these datasets to produce a reliable climate dataset for climate studies and hydrological modelling. This study suggests a framework called the REFerence Reliability Evaluation System (REFRES) that systematically ranks multiple climate datasets to generate a hybrid climate dataset for a region. To demonstrate the usefulness of the proposed framework, REFRES was applied to produce a historical hybrid climate dataset for the Athabasca River basin (ARB) in Alberta, Canada. A proxy validation was also conducted to prove the applicability of the generated hybrid climate datasets to hydrologic simulations. This study evaluated five climate datasets, including the station-based gridded climate datasets ANUSPLIN (Australia National University Spline), Alberta Township, and the Pacific Climate Impacts Consortium's (PCIC) PNWNAmet (PCIC NorthWest North America meteorological dataset), a multi-source gridded dataset (Canadian Precipitation Analysis; CaPA), and a reanalysis-based dataset (North American Regional Reanalysis; NARR). The results showed that the gridded climate interpolated from station data performed better than multi-source- and reanalysis-based climate datasets. For the Athabasca River basin, Township and ANUSPLIN were ranked first for precipitation and temperature, respectively. The proxy validation also confirmed the utility of hybrid climate datasets in hydrologic simulations compared with the other five individual climate datasets investigated in this study. These results indicate that the hybrid climate dataset provides the best representation of historical climatic conditions and, thus, enhances the reliability of hydrologic simulations.

APA, Harvard, Vancouver, ISO, and other styles

17

van Niel, T. G., and T. R. McVicar. "Assessing positional accuracy and its effects on rice crop area measurement: an application at Coleambally Irrigation Area." Australian Journal of Experimental Agriculture 41, no. 4 (2001): 557. http://dx.doi.org/10.1071/ea00140.

Full text

Abstract:

If management decisions are based on geospatial data that have not been assessed for spatial accuracy, then debate about both the measurements and the decisions themselves can occur. This debate, in part, can be avoided by evaluating the spatial accuracy of geospatial data, leading to heightened confidence in both the data and the decisions made from the data. To increase the effectiveness of environmental compliance monitoring, the spatial accuracies of 2 Geographic Information System datasets were estimated at the Coleambally Irrigation Area, New South Wales. The first, high-resolution digital aerial photography acquired in January 2000, is the Geographic Information System baseline data for Coleambally Irrigation Area. The second, Digital Topographic Data Base roads data, although not a reference dataset at Coleambally Irrigation Area, is often used as a baseline dataset across Australia. Neither dataset met the National Mapping Council of Australia’s standard of map accuracy, so a new version of the digital aerial photography was created that did. The positional accuracy of the improved dataset was over 4 times more accurate than the Digital Topographic Data Base roads dataset and over 2.5 times more accurate than the original digital aerial photography. It was also found that the overall areal error of paddocks measured from the improved dataset decreased as more paddock areas were added together; a finding that has a direct impact on management decisions at Coleambally Irrigation Area. This study both provides a demonstration of how to assess and improve spatial accuracy and shows that this process is not unduly complicated.

APA, Harvard, Vancouver, ISO, and other styles

18

Wang, Rongfang, Chenchen Zhang, Chao Chen, Hongxia Hao, Weibin Li, and Licheng Jiao. "A Multi-Modality Fusion and Gated Multi-Filter U-Net for Water Area Segmentation in Remote Sensing." Remote Sensing 16, no. 2 (2024): 419. http://dx.doi.org/10.3390/rs16020419.

Full text

Abstract:

Water area segmentation in remote sensing is of great importance for flood monitoring. To overcome some challenges in this task, we construct the Water Index and Polarization Information (WIPI) multi-modality dataset and propose a multi-Modality Fusion and Gated multi-Filter U-Net (MFGF-UNet) convolutional neural network. The WIPI dataset can enhance the water information while reducing the data dimensionality: specifically, the Cloud-Free Label provided in the dataset can effectively alleviate the problem of labeled sample scarcity. Since a single form or uniform kernel size cannot handle the variety of sizes and shapes of water bodies, we propose the Gated Multi-Filter Inception (GMF-Inception) module in our MFGF-UNet. Moreover, we utilize an attention mechanism by introducing a Gated Channel Transform (GCT) skip connection and integrating GCT into GMF-Inception to further improve model performance. Extensive experiments on three benchmarks, including the WIPI, Chengdu and GF2020 datasets, demonstrate that our method achieves favorable performance with lower complexity and better robustness against six competing approaches. For example, on the WIPI, Chengdu and GF2020 datasets, the proposed MFGF-UNet model achieves F1 scores of 0.9191, 0.7410 and 0.8421, respectively, with the average F1 score on the three datasets 0.0045 higher than that of the U-Net model; likewise, GFLOPS were reduced by 62% on average. The new WIPI dataset, the code and the trained models have been released on GitHub.

APA, Harvard, Vancouver, ISO, and other styles

19

Chen, Jyun-Ru, Kuei-Yuan Hou, Yung-Chen Wang, et al. "Enhanced Malignancy Prediction of Small Lung Nodules in Different Populations Using Transfer Learning on Low-Dose Computed Tomography." Diagnostics 15, no. 12 (2025): 1460. https://doi.org/10.3390/diagnostics15121460.

Full text

Abstract:

Background: Predicting malignancy in small lung nodules (SLNs) across diverse populations is challenging due to significant demographic and clinical variations. This study investigates whether transfer learning (TL) can improve malignancy prediction for SLNs using low-dose computed tomography across datasets from different countries. Methods: We collected two datasets: an Asian dataset (669 SLNs from Cathay General Hospital, CGH, Taiwan) and an American dataset (600 SLNs from the National Lung Screening Trial, NLST, America). Initial U-Net models for malignancy prediction were trained on each dataset, followed by the application of TL to transfer model parameters across datasets. Model performance was evaluated using accuracy, specificity, sensitivity, and the area under the receiver operating characteristic curve (AUC). Results: Significant demographic differences (p < 0.001) were observed between the CGH and NLST datasets. Initial models trained on one dataset showed a substantial performance decline of 15.2% to 97.9% when applied to the other dataset. TL enhanced model performance across datasets by 21.1% to 159.5% (p < 0.001), achieving an accuracy of 0.86–0.91, sensitivity of 0.81–0.96, specificity of 0.89–0.92, and an AUC of 0.90–0.97. Conclusions: TL enhances SLN malignancy prediction models by addressing population variations and enabling their application across diverse international datasets.

APA, Harvard, Vancouver, ISO, and other styles

20

Huang, Xin, Jie Yang, Wenrui Wang, and Zhengrong Liu. "Mapping 10 m global impervious surface area (GISA-10m) using multi-source geospatial data." Earth System Science Data 14, no. 8 (2022): 3649–72. http://dx.doi.org/10.5194/essd-14-3649-2022.

Full text

Abstract:

Abstract. Artificial impervious surface area (ISA) documents the human footprint. Accurate, timely, and detailed ISA datasets are therefore essential for global climate change studies and urban planning. However, due to the lack of sufficient training samples and operational mapping methods, global ISA datasets at a 10 m resolution are still lacking. To this end, we proposed a global ISA mapping method leveraging multi-source geospatial data. Based on the existing satellite-derived ISA maps and crowdsourced OpenStreetMap (OSM) data, 58 million training samples were extracted via a series of temporal, spatial, spectral, and geometric rules. We then produced a 10 m resolution global ISA dataset (GISA-10m) from over 2.7 million Sentinel optical and radar images on the Google Earth Engine platform. Based on test samples that are independent of the training set, GISA-10m achieves an overall accuracy of greater than 86 %. In addition, the GISA-10m dataset was comprehensively compared with the existing global ISA datasets, and the superiority of GISA-10m was confirmed. The global road area was further investigated, courtesy of this 10 m dataset. It was found that China and the US have the largest areas of ISA and road. The global rural ISA was found to be 2.2 times that of urban while the rural road area was found to be 1.5 times larger than that of the urban regions. The global road area accounts for 14.2 % of the global ISA, 57.9 % of which is located in the top 10 countries. Generally speaking, the produced GISA-10m dataset and the proposed sampling and mapping method are able to achieve rapid and efficient global mapping, and have the potential for detecting other land covers. It is also shown that global ISA mapping can be improved by incorporating OSM data. The GISA-10m dataset could be used as a fundamental parameter for Earth system science, and will provide valuable support for urban planning and water cycle study. The GISA-10m can be freely downloaded from https://doi.org/10.5281/zenodo.5791855 (Huang et al., 2021a).

APA, Harvard, Vancouver, ISO, and other styles

21

Kotan, Muhammed, Ömer Faruk Seymen, Levent Çallı, Sena Kasım, Burcu Çarklı Yavuz, and Tijen Över Özçelik. "A novel methodological approach to SaaS churn prediction using whale optimization algorithm." PLOS One 20, no. 5 (2025): e0319998. https://doi.org/10.1371/journal.pone.0319998.

Full text

Abstract:

Customer churn is a critical concern in the Software as a Service (SaaS) sector, potentially impacting long-term growth within the cloud computing industry. The scarcity of research on customer churn models in SaaS, particularly regarding diverse feature selection methods and predictive algorithms, highlights a significant gap. Addressing this would enhance academic discourse and provide essential insights for managerial decision-making. This study introduces a novel approach to SaaS churn prediction using the Whale Optimization Algorithm (WOA) for feature selection. Results show that WOA-reduced datasets improve processing efficiency and outperform full-variable datasets in predictive performance. The study encompasses a range of prediction techniques with three distinct datasets evaluated derived from over 1,000 users of a multinational SaaS company: the WOA-reduced dataset, the full-variable dataset, and the chi-squared-derived dataset. These three datasets were examined with the most used in literature, k-nearest neighbor, Decision Trees, Naïve Bayes, Random Forests, and Neural Network techniques, and the performance metrics such as Area Under Curve, Accuracy, Precision, Recall, and F1 Score were used as classification success. The results demonstrate that the WOA-reduced dataset outperformed the full-variable and chi-squared-derived datasets regarding performance metrics.

APA, Harvard, Vancouver, ISO, and other styles

22

Vincke, S., and M. Vergauwen. "GEO-REGISTERING CONSECUTIVE DATASETS BY MEANS OF A REFERENCE DATASET, ELIMINATING GROUND CONTROL POINT INDICATION." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-5/W2 (September 20, 2019): 85–91. http://dx.doi.org/10.5194/isprs-archives-xlii-5-w2-85-2019.

Full text

Abstract:

<p><strong>Abstract.</strong> The architecture, engineering and construction (AEC) industry’s interest in more advanced ways of regular monitoring of construction site activities and the achieved building progress has been rising recently. This requires frequent recordings of the area. This is only feasible if the profound observations only require limited time, both for the actual capturing on-site as well as processing of the recorded data. Moreover, for monitoring purposes, it is vital that all datasets use a single, unique reference system. This allows for an easy comparison of various observations to determine both building progress as well as possible construction deviations or errors.</p><p>In this work, a framework is proposed that facilitates a faster and more efficient way of co-registering or geo-registering consecutive datasets. It comprises three major stages, starting with the capturing of the surroundings of the construction site. By thoroughly adding numerous ground control points (GCPs) in a second phase, the processed result of this input data can be considered as a reference dataset. In a third stage, this known component is used as additional input for the processing of subsequently captured datasets. Using overlapping areas, the new observations can be immediately transferred to the correct reference system. This eliminates the indication of GCPs in subsequent datasets, which is known to be time-consuming and error-prone.</p><p>Although in this work the focus of the proposed framework lies on a photogrammetric recording approach, it also is applicable for laser scanning. Its potential is showcased on a real-world apartment construction site in Ghent, Belgium. In the test case, the presented approach is shown to be efficient, with comparable accuracies as other current methods, however, requiring less time and effort.</p>

APA, Harvard, Vancouver, ISO, and other styles

23

Thomas, Abraham. "Detection of land use / cover changes of the KOSH region over a period of 14 years using the South African National Land Cover datasets for 2000 and 2014." South African Journal of Geomatics 8, no. 2 (2022): 108–29. http://dx.doi.org/10.4314/sajg.v8i2.1.

Full text

Abstract:

Simple algebraic change detection techniques viz. image difference and image ratio were applied to the South African national land use / cover (NLC) datasets of years 2000 and 2014, prepared in grid format covering the Klerksdorp–Orkney–Stilfontein–Hartebeestfontein (KOSH) region in order to assess land use/land cover changes. Both the 2000 and 2014 NLC datasets were generated from Landsat images using different classification schemes and the code values & attributes of the land cover classes of the two datasets were different/not comparable. In order to make these datasets comparable for change detection, the NLC2000 dataset was examined in ArcView GIS by superimposing it onto the NLC2014 dataset and similarities and differences were identified. For each cover type of the NLC2000 dataset, comparable cover type of the 2014 dataset was identified by making a query to the NLC2000 dataset and after viewing the spatial distributions of selected units in respect of the NLC2014 dataset. Suitable code values of NLC2014 dataset were identified for the NLC2000 dataset and it was later reclassified. The land use / cover change detection study reveals that increase in areas were observed for the cover types: Cultivated common fields (low), Cultivated common fields (med), Mines 2 semi-bare, Wetlands, Urban commercial and Plantations/woodlots mature. The Grassland, Thicket/dense bush, Urban residential (dense trees/bush), Mines 1 bare, and Cultivated common pivots (high) showed a decrease in places. During the 14 years, Grassland had decreased from 2,132.47 km2 (77.35% of the total area) to 1,629.78 km2 (59.11% of the total area) owing to landscape transformation to other land covers (e.g. Cultivated common fields and Urban residential) due to human activities. The percentage increase in areas observed for the Cultivated common fields (low and medium) were 8.21% and 2.96% while the Mines 2 semi-bare, Wetlands, Urban commercial, Plantations/woodlots mature showed increases of 0.67%, 0.32%, 0.28% and 0.23% respectively. The area of Thicket/dense bush decreased from 108.15 km2 to 56.71 km2 (change of 1.87%). Maps of land use/land cover changes and statistics obtained for the changed areas are very useful for identifying various changes occurring in different classes and for monitoring land use dynamics.

APA, Harvard, Vancouver, ISO, and other styles

24

Ueda, Daiju, Akira Yamamoto, Naoyoshi Onoda, et al. "Development and validation of a deep learning model for detection of breast cancers in mammography from multi-institutional datasets." PLOS ONE 17, no. 3 (2022): e0265751. http://dx.doi.org/10.1371/journal.pone.0265751.

Full text

Abstract:

Objectives The objective of this study was to develop and validate a state-of-the-art, deep learning (DL)-based model for detecting breast cancers on mammography. Methods Mammograms in a hospital development dataset, a hospital test dataset, and a clinic test dataset were retrospectively collected from January 2006 through December 2017 in Osaka City University Hospital and Medcity21 Clinic. The hospital development dataset and a publicly available digital database for screening mammography (DDSM) dataset were used to train and to validate the RetinaNet, one type of DL-based model, with five-fold cross-validation. The model’s sensitivity and mean false positive indications per image (mFPI) and partial area under the curve (AUC) with 1.0 mFPI for both test datasets were externally assessed with the test datasets. Results The hospital development dataset, hospital test dataset, clinic test dataset, and DDSM development dataset included a total of 3179 images (1448 malignant images), 491 images (225 malignant images), 2821 images (37 malignant images), and 1457 malignant images, respectively. The proposed model detected all cancers with a 0.45–0.47 mFPI and had partial AUCs of 0.93 in both test datasets. Conclusions The DL-based model developed for this study was able to detect all breast cancers with a very low mFPI. Our DL-based model achieved the highest performance to date, which might lead to improved diagnosis for breast cancer.

APA, Harvard, Vancouver, ISO, and other styles

25

Lo, Jui-En, Eugene Yu-Chuan Kang, Yun-Nung Chen, et al. "Data Homogeneity Effect in Deep Learning-Based Prediction of Type 1 Diabetic Retinopathy." Journal of Diabetes Research 2021 (December 28, 2021): 1–9. http://dx.doi.org/10.1155/2021/2751695.

Full text

Abstract:

This study is aimed at evaluating a deep transfer learning-based model for identifying diabetic retinopathy (DR) that was trained using a dataset with high variability and predominant type 2 diabetes (T2D) and comparing model performance with that in patients with type 1 diabetes (T1D). The Kaggle dataset, which is a publicly available dataset, was divided into training and testing Kaggle datasets. In the comparison dataset, we collected retinal fundus images of T1D patients at Chang Gung Memorial Hospital in Taiwan from 2013 to 2020, and the images were divided into training and testing T1D datasets. The model was developed using 4 different convolutional neural networks (Inception-V3, DenseNet-121, VGG1, and Xception). The model performance in predicting DR was evaluated using testing images from each dataset, and area under the curve (AUC), sensitivity, and specificity were calculated. The model trained using the Kaggle dataset had an average (range) AUC of 0.74 (0.03) and 0.87 (0.01) in the testing Kaggle and T1D datasets, respectively. The model trained using the T1D dataset had an AUC of 0.88 (0.03), which decreased to 0.57 (0.02) in the testing Kaggle dataset. Heatmaps showed that the model focused on retinal hemorrhage, vessels, and exudation to predict DR. In wrong prediction images, artifacts and low-image quality affected model performance. The model developed with the high variability and T2D predominant dataset could be applied to T1D patients. Dataset homogeneity could affect the performance, trainability, and generalization of the model.

APA, Harvard, Vancouver, ISO, and other styles

26

Xie, Ning-Ning, Fang-Fang Wang, Jue Zhou, Chang Liu, and Fan Qu. "Establishment and Analysis of a Combined Diagnostic Model of Polycystic Ovary Syndrome with Random Forest and Artificial Neural Network." BioMed Research International 2020 (August 20, 2020): 1–13. http://dx.doi.org/10.1155/2020/2613091.

Full text

Abstract:

Polycystic ovary syndrome (PCOS) is one of the most common metabolic and reproductive endocrinopathies. However, few studies have tried to develop a diagnostic model based on gene biomarkers. In this study, we applied a computational method by combining two machine learning algorithms, including random forest (RF) and artificial neural network (ANN), to identify gene biomarkers and construct diagnostic model. We collected gene expression data from Gene Expression Omnibus (GEO) database containing 76 PCOS samples and 57 normal samples; five datasets were utilized, including one dataset for screening differentially expressed genes (DEGs), two training datasets, and two validation datasets. Firstly, based on RF, 12 key genes in 264 DEGs were identified to be vital for classification of PCOS and normal samples. Moreover, the weights of these key genes were calculated using ANN with microarray and RNA-seq training dataset, respectively. Furthermore, the diagnostic models for two types of datasets were developed and named neuralPCOS. Finally, two validation datasets were used to test and compare the performance of neuralPCOS with other two set of marker genes by area under curve (AUC). Our model achieved an AUC of 0.7273 in microarray dataset, and 0.6488 in RNA-seq dataset. To conclude, we uncovered gene biomarkers and developed a novel diagnostic model of PCOS, which would be helpful for diagnosis.

APA, Harvard, Vancouver, ISO, and other styles

27

Yamada, Yusuke, Toshihiro Ohkubo, and Katsuto Shimizu. "Causal Analysis of Accuracy Obtained Using High-Resolution Global Forest Change Data to Identify Forest Loss in Small Forest Plots." Remote Sensing 12, no. 15 (2020): 2489. http://dx.doi.org/10.3390/rs12152489.

Full text

Abstract:

Identifying areas of forest loss is a fundamental aspect of sustainable forest management. Global Forest Change (GFC) datasets developed by Hansen et al. (in Science 342:850–853, 2013) are publicly available, but the accuracy of these datasets for small forest plots has not been assessed. We used a forest-wide polygon-based approach to assess the accuracy of using GFC data to identify areas of forest loss in an area containing numerous small forest plots. We evaluated the accuracy of detection of individual forest-loss polygons in the GFC dataset in terms of a “recall ratio”, the ratio of the spatial overlap of a forest-loss polygon determined from the GFC dataset to the area of a corresponding reference forest-loss polygon, which we determined by visual interpretation of aerial photographs. We analyzed the structural relationships of recall ratio with area of forest loss, tree species, and slope of the forest terrain by using linear non-Gaussian acyclic modelling. We showed that only 11.1% of forest-loss polygons in the reference dataset were successfully identified in the GFC dataset. The inferred structure indicated that recall ratio had the strongest relationships with area of forest loss, forest tree species, and height of the forest canopy. Our results indicate the need for careful consideration of structural relationships when using GFC datasets to identify areas of forest loss in regions where there are small forest plots. Moreover, further studies are required to examine the structural relationships for accuracy of land-use classification in forested areas in various regions and with different forest characteristics.

APA, Harvard, Vancouver, ISO, and other styles

28

VIJAYALAKSHMI, K., and V. VIJAY KUMAR. "Performance Analysis of Remote Sensing Application using Area Wise Prediction." Oriental journal of computer science and technology 12, no. 1 (2019): 21–27. http://dx.doi.org/10.13005/ojcst12.01.05.

Full text

Abstract:

Remote sensors from the Satellite or Aircrafts are generated by huge volume of data which can utilize for impending signification if collected data aggregated effectively incorporates by insight information. Data is collection from simple to hybrid devices, which are continuously working for technology around us and communicate with each other. These devices are transferring huge amounts of real time data daily. The transaction added to the synchronized inaccessible sensing data that is retrieving the useful information in the proficient way of classification in the direction of the severe computational challenges, analyze, the assortment, and accumulate, where gathered data is inaccessible. The real time sensing devices will continuously export data. In this work, we will implement the big data analytics on remote sensing datasets. We utilized BEST software for header analysis of the datasets and retrieving the full resolution image from the dataset. Then retrieved image is divided into smaller blocks for applying statistical. By applying certain rules and conditions in the form of algorithm, determine the land and sea blocks of image dataset. Our end results are proficiently analyzing real-time remote sensing utilizing the land beacon structure. Finally, a comprehensive investigation of the remotely intelligence earth beacon massive information for earth and ocean space are available by utilizing- Hadoop.

APA, Harvard, Vancouver, ISO, and other styles

29

Irawan, Dasapta, Thomas Putranto, and Achmad Darul. "Groundwater quality dataset of Semarang area, Indonesia." Research Ideas and Outcomes 4 (October 16, 2018): e29319. https://doi.org/10.3897/rio.4.e29319.

Full text

Abstract:

The regional environmental changes are affecting groundwater ecosystems in Semarang area. The development of new settlements, industrial complexes, and trade centers have degraded the groundwater setting of the city, which serves as the capital of Central Java Province. This has led us to compile several groundwater quality dataset that have been taken from 1992 to 2007. Our original motivation is to come up with an open dataset that can be used as the baseline for groundwater monitoring. The dataset consists of 58 samples were taken in 1992, 1993, 2003, 2006, and 2007 using well point data from several reports from Ministry of Energy and Mineral Resources, engineering consultants, as well as from researchers from Universitas Diponegoro and Institut Teknologi Bandung. Each site has a set of 20 physical and chemical variables.

APA, Harvard, Vancouver, ISO, and other styles

30

Sharma, Vijeta, Manjari Gupta, Ajai Kumar, and Deepti Mishra. "EduNet: A New Video Dataset for Understanding Human Activity in the Classroom Environment." Sensors 21, no. 17 (2021): 5699. http://dx.doi.org/10.3390/s21175699.

Full text

Abstract:

Human action recognition in videos has become a popular research area in artificial intelligence (AI) technology. In the past few years, this research has accelerated in areas such as sports, daily activities, kitchen activities, etc., due to developments in the benchmarks proposed for human action recognition datasets in these areas. However, there is little research in the benchmarking datasets for human activity recognition in educational environments. Therefore, we developed a dataset of teacher and student activities to expand the research in the education domain. This paper proposes a new dataset, called EduNet, for a novel approach towards developing human action recognition datasets in classroom environments. EduNet has 20 action classes, containing around 7851 manually annotated clips extracted from YouTube videos, and recorded in an actual classroom environment. Each action category has a minimum of 200 clips, and the total duration is approximately 12 h. To the best of our knowledge, EduNet is the first dataset specially prepared for classroom monitoring for both teacher and student activities. It is also a challenging dataset of actions as it has many clips (and due to the unconstrained nature of the clips). We compared the performance of the EduNet dataset with benchmark video datasets UCF101 and HMDB51 on a standard I3D-ResNet-50 model, which resulted in 72.3% accuracy. The development of a new benchmark dataset for the education domain will benefit future research concerning classroom monitoring systems. The EduNet dataset is a collection of classroom activities from 1 to 12 standard schools.

APA, Harvard, Vancouver, ISO, and other styles

31

Acquah, Joseph, Senyefia Bosson-Amedenu, and Eric Adubuah. "On the Implications of Ignoring Competing Risk in Survival Analysis: The Case of the Product-Limit Estimator." Journal of Mathematics Research 15, no. 5 (2023): 1. http://dx.doi.org/10.5539/jmr.v15n5p1.

Full text

Abstract:

Although the Kaplan-Meier (KM) is a single event model, it is frequently used in literature with datasets that are assumed to be cause-specific without any proper verification. It is crucial to evaluate the implication of this on the probability estimates. This study compares the estimates of the cumulative incidence functions to the complement of the product-limit estimator (1-KM). The KM was found to inflate probability estimates when the dataset is unverified for competing risk. Estimates with lower standard errors and a larger area under the Receiver Operation Characteristic (ROC) curve were related to datasets verified for competing events, while estimates with datasets unverified for competing risk had lower area under the ROC curve and higher standard errors. The results support the idea that since the product-limit estimator is a cause-specific model, it naturally performs better for a single event model than in the case of several events; hence, it is necessary to verify the dataset for competing events. The findings of this study clearly suggest that before choosing a modelling strategy, one should confirm competing risks in the survival dataset.

APA, Harvard, Vancouver, ISO, and other styles

32

Torrie, Shad, Andrew Sumsion, Dah-Jye Lee, and Zheng Sun. "Data-Driven Advancements in Lip Motion Analysis: A Review." Electronics 12, no. 22 (2023): 4698. http://dx.doi.org/10.3390/electronics12224698.

Full text

Abstract:

This work reviews the dataset-driven advancements that have occurred in the area of lip motion analysis, particularly visual lip-reading and visual lip motion authentication, in the deep learning era. We provide an analysis of datasets and their usage, creation, and associated challenges. Future research can utilize this work as a guide for selecting appropriate datasets and as a source of insights for creating new and innovative datasets. Large and varied datasets are vital to a successful deep learning system. There have been many incredible advancements made in these fields due to larger datasets. There are indications that even larger, more varied datasets would result in further improvement upon existing systems. We highlight the datasets that brought about the progression in lip-reading systems from digit- to word-level lip-reading, and then from word- to sentence-level lip-reading. Through an in-depth analysis of lip-reading system results, we show that datasets with large amounts of diversity increase results immensely. We then discuss the next step for lip-reading systems to move from sentence- to dialogue-level lip-reading and emphasize that new datasets are required to make this transition possible. We then explore lip motion authentication datasets. While lip motion authentication has been well researched, it is not very unified on a particular implementation, and there is no benchmark dataset to compare the various methods. As was seen in the lip-reading analysis, large, diverse datasets are required to evaluate the robustness and accuracy of new methods attempted by researchers. These large datasets have pushed the work in the visual lip-reading realm. Due to the lack of large, diverse, and publicly accessible datasets, visual lip motion authentication research has struggled to validate results and real-world applications. A new benchmark dataset is required to unify the studies in this area such that they can be compared to previous methods as well as validate new methods more effectively.

APA, Harvard, Vancouver, ISO, and other styles

33

Xiong, Zili, Wei Shangguan, Vahid Nourani, et al. "Assessing the Reliability of Global Carbon Flux Dataset Compared to Existing Datasets and Their Spatiotemporal Characteristics." Climate 11, no. 10 (2023): 205. http://dx.doi.org/10.3390/cli11100205.

Full text

Abstract:

Land carbon fluxes play a critical role in ecosystems, and acquiring a comprehensive global database of carbon fluxes is essential for understanding the Earth’s carbon cycle. The primary methods of obtaining the spatial distribution of land carbon fluxes include utilizing machine learning models based on in situ measurements, estimating through satellite remote sensing, and simulating ecosystem models. Recently, an innovative machine learning product known as the Global Carbon Flux Dataset (GCFD) has been released. In this study, we assessed the reliability of the GCFD by comparing it with existing data products, including two machine learning products (FLUXCOM and NIES (National Institute for Environmental Studies)), two ecosystem model products (TRENDY and EC-LUE (eddy covariance–light use efficiency model)), and one remote sensing product (Global Land Surface Satellite), on both site and global scales. Our findings indicate that, in terms of average absolute difference, the spatial distribution of the GCFD is most similar to the NIES product, albeit with slightly larger discrepancies compared to the other two types of products. When using site observations as the benchmark, gross primary production (GPP), respiration of ecosystem (RECO), and net ecosystem exchange of machine learning products exhibit higher R2 (ranging from 0.57 to 0.85, 0.53–0.79, and 0.31–0.70, respectively) compared to model products and remote sensing products. Furthermore, we analyzed the spatial and temporal distribution characteristics of carbon fluxes in various regions. The results demonstrate an upward trend in both GPP and RECO over the past two decades, while NEE exhibits an opposite trend. This trend is particularly pronounced in tropical regions, where higher GPP is observed in tropical, subtropical, and oceanic climate zones. Additionally, two remote sensing variables that influence changes in carbon fluxes, i.e., fraction absorbed photosynthetically active radiation and leaf area index, exhibit relatively consistent spatial and temporal characteristics. Overall, our study can provide valuable insights into different types of carbon flux products and contribute to understanding the general features of global carbon fluxes.

APA, Harvard, Vancouver, ISO, and other styles

34

Wang, Yihao, Jianyu Chen, Xuanqin Mou, et al. "Fusion of Hyperspectral and Multispectral Images with Radiance Extreme Area Compensation." Remote Sensing 16, no. 7 (2024): 1248. http://dx.doi.org/10.3390/rs16071248.

Full text

Abstract:

Although the fusion of multispectral (MS) and hyperspectral (HS) images in remote sensing has become relatively mature, and different types of fusion methods have their own characteristics in terms of fusion effect, data dependency, and computational efficiency, few studies have focused on the impact of radiance extreme areas, which widely exist in real remotely sensed scenes. To this end, this paper proposed a novel method called radiance extreme area compensation fusion (RECF). Based on the architecture of spectral unmixing fusion, our method uses the reconstruction of error map to construct local smoothing constraints during unmixing and utilizes the nearest-neighbor multispectral data to achieve optimal replacement compensation, thereby eliminating the impact of overexposed and underexposed areas in hyperspectral data on the fusion effect. We compared the RECF method with 11 previous published methods on three sets of airborne hyperspectral datasets and HJ2 satellite hyperspectral data and quantitatively evaluated them using 5 metrics, including PSNR and SAM. On the test dataset with extreme radiance interference, the proposed RECF method achieved well in the overall evaluation results; for instance, the PSNR metric reached 47.6076 and SAM reached 0.5964 on the Xiong’an dataset. In addition, the result shows that our method also achieved better visual effects on both simulation and real datasets.

APA, Harvard, Vancouver, ISO, and other styles

35

Mohamed Taha, Abdallah M., Yantao Xi, Qingping He, Anqi Hu, Shuangqiao Wang, and Xianbin Liu. "Investigating the Capabilities of Various Multispectral Remote Sensors Data to Map Mineral Prospectivity Based on Random Forest Predictive Model: A Case Study for Gold Deposits in Hamissana Area, NE Sudan." Minerals 13, no. 1 (2022): 49. http://dx.doi.org/10.3390/min13010049.

Full text

Abstract:

Remote sensing data provide significant information about surface geological features, but they have not been fully investigated as a tool for delineating mineral prospective targets using the latest advancements in machine learning predictive modeling. In this study, besides available geological data (lithology, structure, lineaments), Landsat-8, Sentinel-2, and ASTER multispectral remote sensing data were processed to produce various predictor maps, which then formed four distinct datasets (namely Landsat-8, Sentinel-2, ASTER, and Data-integration). Remote sensing enhancement techniques, including band ratio (BR), principal component analysis (PCA), and minimum noise fraction (MNF), were applied to produce predictor maps related to hydrothermal alteration zones in Hamissana area, while geological-based predictor maps were derived from applying spatial analysis methods. These four datasets were used independently to train a random forest algorithm (RF), which was then employed to conduct data-driven gold mineral prospectivity modeling (MPM) of the study area and compare the capability of different datasets. The modeling results revealed that ASTER and Sentinel-2 datasets achieved very similar accuracy and outperformed Landsat-8 dataset. Based on the area under the ROC curve (AUC), both datasets had the same prediction accuracy of 0.875. However, ASTER dataset yielded the highest overall classification accuracy of 73%, which is 6% higher than Sentinel-2 and 13% higher than Landsat-8. By using the data-integration concept, the prediction accuracy increased by about 6% (AUC: 0.938) compared with the ASTER dataset. Hence, these results suggest that the framework of exploiting remote sensing data is promising and should be used as an alternative technique for MPM in case of data availability issues.

APA, Harvard, Vancouver, ISO, and other styles

36

Irhamsyah, Muhammad, Qurrata A’yuni, Khairun Saddami, Nasaruddin Nasaruddin, Khairul Munadi, and Fitri Arnia. "Impact of using various x-ray dataset in detecting tuberculosis based on deep learning." Radioelectronic and Computer Systems 2025, no. 1 (2025): 165–86. https://doi.org/10.32620/reks.2025.1.12.

Full text

Abstract:

The subject matter is that the characteristics of tuberculosis are difficult to study visually. Therefore, a computer-aided system based on deep learning can be applied to X-ray image recognition. Many studies have been conducted in this area but have yet to achieve a high accuracy rate. The goal of this study is to determine the effect of using various datasets in developing deep learning models. The tasks to be solved include exploring various deep learning architectures and deep fine-tuning hyperparameters, as well as using various dataset sources. The method used is the development of a deep learning model of convolutional neural network (CNN) using transfer learning to classify X-ray images into binary classes of normal and tuberculosis (TB). The CNN architectures used are the pretrained networks of ResNet and EfficientNet, along with their variants. The pre-trained network was trained on a dataset obtained from four sources: Shenzhen, Montgomery, RSNA CXR, and Belarus. The dataset is divided into three schemes: Scheme one consists of the Shenzhen dataset with low-quality X-ray images; Scheme two is the Montgomery, RSNA, and Belarus datasets that show good contrast in the indicated TB area; and Scheme three contains datasets from all sources to allow for more datasets to be learned. The augmentation, dropout, and L2 regularization methods were also applied to enhance learning performance. The following results were obtained: the models performed better with the high-quality X-ray images in Scheme Two but not with the large dataset in Scheme Three. Regarding network performance, the models resulting from ResNet-101 and EfficientNetB0 outperformed the others with good fit learning and capability in recognizing X-ray images with an accuracy rate of 99.2%. In conclusion, the best approach to enhance learning performance is to use high-quality input and apply regularizations.

APA, Harvard, Vancouver, ISO, and other styles

37

Afifuddin Arif Shihabuddin Arip, Norazlianie Sazali, Kumaran Kadirgama, Ahmad Shahir Jamaludin, Faiz Mohd Turan, and Norhaida Ab. Razak. "Object Detection for Safety Attire Using YOLO (You Only Look Once)." Journal of Advanced Research in Applied Mechanics 113, no. 1 (2024): 37–51. http://dx.doi.org/10.37934/aram.113.1.3751.

Full text

Abstract:

Personal protective equipment (PPE) usage is mandated for all employees to prevent workplace accidents and foster a safe and healthy work environment. Using YOLOv8 machine learning and Google Colab's web-based development environment, this research aims to create an immediate detection system for PPE violations in the workplace. By keeping track of PPE compliance, the system is intended to increase workplace safety and prevent accidents. The dataset is collected through a mixture of real-life image gathering and internet datasets. Various images are collected that aim to train the model to detect objects from afar, close, and individually. The research methodology includes a review of the literature, the gathering, pre-processing, and training of models. According to the use of safety helmets, safety shoes, and gloves, there are three different classes of detection based on the bounding box. The system successfully detected the classes with an overall score above 0.8. The safety helmet achieved 0.969, the safety gloves achieved 0.857, followed by the safety vest with 0.887. The findings from this study indicate that the developed system can effectively improve occupational safety and health management. However, there is a detection error factor caused by the lighting and colors. Future research can focus on integrating the system with other work safety systems to provide a comprehensive solution for accident prevention.

APA, Harvard, Vancouver, ISO, and other styles

38

Wei, Yi, Balaji Selvaraj, Mayank Patwari, et al. "Abstract 5427: Improving non-small cell lung cancer segmentation on a challenging dataset." Cancer Research 83, no. 7_Supplement (2023): 5427. http://dx.doi.org/10.1158/1538-7445.am2023-5427.

Full text

Abstract:

Abstract When applied to different datasets, performance of the same deep learning tumor segmentation model can greatly vary. In a non-small cell lung cancer CT scan segmentation study that consists of two datasets, we found that the SwinUNETR model achieves state-of-the-art DICE score on a public dataset NSCLC but performs badly on a private dataset of curated data collected clinically. This performance variation reduces the applicability of such models. To mitigate this gap, through experimentation, we identified a set of techniques and applied them in the following order: (1) normalize a dataset to reduce differences between images. (2) stratify a dataset according to tumor sizes to form a more diverse training set. (3) isolate the lung area before training to help the model focus on the right area. (4) before training, initialize models with self-supervised pre-training weights (5) use a new loss function to give more weights on the cancerous area (6) after a model is trained, perform 3-axis test time flipping augmentation and ensemble the final predictions. In our experiments, our set of techniques improved the test DICE score for both datasets we tested on, where the best improvement was a 53% improvement from 0.32 to 0.49 DICE score. Citation Format: Yi Wei, Balaji Selvaraj, Mayank Patwari, Qin Li, Meng Xu, Konstantinos Sidiropoulos, Zhenning Zhang, Leon Fedden, Anant Madabhushi, Mohammadhadi Khorrami, Vidya Sankar Viswanathan, Amit Gupta. Improving non-small cell lung cancer segmentation on a challenging dataset. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5427.

APA, Harvard, Vancouver, ISO, and other styles

39

Sannier, Christophe, Eva Ivits, Gergely Maucha, Joachim Maes, and Lewis Dijkstra. "Harmonized Pan-European Time Series for Monitoring Soil Sealing." Land 13, no. 7 (2024): 1087. http://dx.doi.org/10.3390/land13071087.

Full text

Abstract:

The European Copernicus Land Monitoring Service (CLMS) has been producing datasets on imperviousness every 3 years since 2006. However, for 2018, the input for the production of the imperviousness dataset was switched from mixed inputs to the Sentinel constellation. While this led to an improvement in the spatial detail from 20 m to 10 m, this also resulted in a break in the time series as the 2018 update was not comparable to the previous reference years. In addition, the European CLMS has been producing a new dataset from 2018 onward entitled CLC+ Backbone, which also includes a sealed area thematic class. When comparing both datasets with sampled reference data, it appears that the imperviousness dataset substantially underestimates sealed areas at the European level. However, the CLC+ dataset is only available from 2018 and currently does not include any change layer. To address these issues, a harmonized continental soil sealing combined dataset for Europe was produced for the entire observation period. This new dataset has been validated to be the best current dataset for monitoring soil sealing as a direct input for European policies with an estimated total sealed area of 175,664 km2 over Europe and an increase in sealed areas of 1297 km2 or 0.7% between 2015 and 2018, which is comparable to previous time periods. Finally, recommendations for future updates and the validation of imperviousness degree geospatial products are given.

APA, Harvard, Vancouver, ISO, and other styles

40

Minghini, Marco, Sara Thabit Gonzalez, and Lorenzo Gabrielli. "Pan-European open building footprints: analysis and comparison in selected countries." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-4/W12-2024 (June 20, 2024): 97–103. http://dx.doi.org/10.5194/isprs-archives-xlviii-4-w12-2024-97-2024.

Full text

Abstract:

Abstract. This paper presents a comprehensive analysis of four non-governmental open building datasets available at the European Union (EU) level, namely OpenStreetMap (OSM), EUBUCCO, Digital Building Stock Model (DBSM) and Microsoft’s Global ML Building Footprints (MS). The objective is to perform a geometrical comparison and identify similarities and differences between them, across five EU countries (Belgium, Denmark, Greece, Malta and Sweden) and various degrees of urbanisation from rural to urban. This is done in a two-step process: first, by comparing the total number and the total areas of building polygons for each dataset and country; second, by intersecting the building polygons and calculating the fraction of the area of each dataset represented by the intersection. Results highlight the influence of urbanisation on the dataset coverage (with increasing completeness when moving from rural to urban areas) and the varying degrees of overlap between the datasets based on a number of factors, including: the amount and up-to-dateness of the input sources used to produce the dataset; the presence of an active OSM community (for OSM and the datasets based on OSM); and the accuracy of Machine Learning algorithms for MS. Based on these findings, we provide insights into the strengths and limitations of each dataset and some recommendations on their use.

APA, Harvard, Vancouver, ISO, and other styles

41

Tanantong, Tanatorn, Nawarerk Chalarak, Sumet Jirattisak, Kitiya Tanantong, and Krittakom Srijiranon. "A Study on Area Assessment of Psoriasis Lesions Using Image Augmentation and Deep Learning: Addressing the Lack of Thai Skin Disease Data." Journal of Current Science and Technology 15, no. 3 (2025): 119. https://doi.org/10.59796/jcst.v15n3.2025.119.

Full text

Abstract:

Psoriasis is a chronic skin disease with significant global and regional impacts, including in Thailand, where its burden is compounded by diagnostic challenges and limited dermatological resources. Psoriasis was selected for this study because it develops in distinct phases, requiring ongoing monitoring and treatment. The distribution of skin lesions plays a crucial role in disease identification and assessment, making it an essential factor for AI-based analysis. The development of AI-based diagnostic tools offers a potential solution. However, there is no publicly available skin disease dataset in Thailand, and image annotation is a challenging and time-consuming task for dermatologists. This scarcity of annotated datasets remains a critical barrier to AI development. This study utilizes the Dermnet dataset and enhances it through the application of image augmentation and style transfer techniques to generate a more diverse and representative dataset, particularly reflecting Thai skin tones. It also evaluates how augmentation techniques affect AI performance in psoriasis classification. The results showed that augmentation significantly enhanced model performance, with EfficientNetB4 achieving the highest accuracy (93.00%) and sensitivity (91.19%). Style transfer emerged as a valuable technique, enabling the creation of skin tone representative datasets that improved model generalizability. These findings align with existing literature. They demonstrate that augmentation techniques can overcome data limitations and enhance model robustness. This study introduces a novel use of style transfer techniques. These are applied to generate augmented datasets that represent Thai skin tones, addressing a critical gap in publicly available dermatology data. By enhancing dataset diversity, style transfer significantly improves the generalizability and accuracy of AI-based psoriasis classification models. These advancements have important implications for clinical practice. They are especially relevant in Thailand and other resource-limited regions, where AI-assisted diagnostics can improve dermatological care access and effectiveness.

APA, Harvard, Vancouver, ISO, and other styles

42

Keys, Laura, and Jussi Baade. "Uncertainty in Catchment Delineations as a Result of Digital Elevation Model Choice." Hydrology 6, no. 1 (2019): 13. http://dx.doi.org/10.3390/hydrology6010013.

Full text

Abstract:

Nine digital elevation model (DEM) datasets were used for separate delineations of the Nam Co, Tibet catchment and its subcatchments, and these delineated areas were compared using the highest resolution dataset, TanDEM-X 12 m, as a baseline. The mean delineated catchment area was within 0.1% percent of the baseline delineation, with a standard error of the mean (SEM) that was 0.13% of the baseline. In a comparison of 49 subcatchment areas, TanDEM-X and ALOS datasets delineated similar areas, followed closely by SRTM 30 m, then SRTM 90 m, ACE2, and ASTER GDEM1. ASTER GDEM2 was a noteworthy outlier, having the largest mean subcatchment area that was nearly three times that of the baseline mean. Correlation coefficients were calculated for subcatchment parameters, SEM, and each DEM’s subcatchment area error. SEM had a weak but significant negative correlation with the mean and median slope. ASTER GDEM1 and GDEM2 were the only datasets that showed any significant correlations with the subcatchment environment variables, though these correlations were also weak. The 30 m posting ASTER GDEMs performed worse against the baseline than the other 30 m and 90 m datasets, showing that posting alone does not determine how good a dataset is. Our results show general small errors for catchment delineations, though there is the possibility for large errors, particularly in the older ASTER and SRTM datasets.

APA, Harvard, Vancouver, ISO, and other styles

43

Zhang, Yong, and Dapeng Wang. "A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets." Abstract and Applied Analysis 2013 (2013): 1–6. http://dx.doi.org/10.1155/2013/196256.

Full text

Abstract:

In imbalanced learning methods, resampling methods modify an imbalanced dataset to form a balanced dataset. Balanced data sets perform better than imbalanced datasets for many base classifiers. This paper proposes a cost-sensitive ensemble method based on cost-sensitive support vector machine (SVM), and query-by-committee (QBC) to solve imbalanced data classification. The proposed method first divides the majority-class dataset into several subdatasets according to the proportion of imbalanced samples and trains subclassifiers using AdaBoost method. Then, the proposed method generates candidate training samples by QBC active learning method and uses cost-sensitive SVM to learn the training samples. By using 5 class-imbalanced datasets, experimental results show that the proposed method has higher area under ROC curve (AUC), F-measure, and G-mean than many existing class-imbalanced learning methods.

APA, Harvard, Vancouver, ISO, and other styles

44

Dudáš, Adam, and Bianka Modrovičová. "Visualization of Prediction Potential Hotspots in Multidimensional Datasets." JOIV : International Journal on Informatics Visualization 9, no. 1 (2025): 258. https://doi.org/10.62527/joiv.9.1.2477.

Full text

Abstract:

Correlation analysis and visual analysis of multidimensional datasets with the objective of identification of patterns and trends is an essential element of decision-making processes. Conventional visualization models in the considered area, such as correlation heatmaps, are used to visually represent the value of the correlation coefficient measured between pairs of attributes of the multidimensional dataset but are hard to read when working with a large number of attributes. This study concerns the design and implementation of a visualization model, which can be used to identify prediction potential hotspots in analysed datasets - parts of the dataset, which are strongly correlated with a high number of attributes in the dataset. The proposed model focuses on a graphical representation of such hotspots based on planar, multicomponent graphs, with the aim of meta-analysis of large, multidimensional datasets. The implemented approach is evaluated on a case study focused on the analysis of the original cubic graph property dataset where several prediction potential hotspots of different correlation types are constructed. Other than the construction of the hotspots themselves, this study shows a comparison of results gained by the graphical model to the conventional model used in the meta-analysis of multidimensional datasets – Shapley value explanations. The results presented in this study point to the need for a robust visualization framework for the analysis of correlation structures in multidimensional datasets and for models of visualization based on virtual and augmented reality.

APA, Harvard, Vancouver, ISO, and other styles

45

Jang, Ryoungwoo, Namkug Kim, Miso Jang, et al. "Assessment of the Robustness of Convolutional Neural Networks in Labeling Noise by Using Chest X-Ray Images From Multiple Centers." JMIR Medical Informatics 8, no. 8 (2020): e18089. http://dx.doi.org/10.2196/18089.

Full text

Abstract:

Background Computer-aided diagnosis on chest x-ray images using deep learning is a widely studied modality in medicine. Many studies are based on public datasets, such as the National Institutes of Health (NIH) dataset and the Stanford CheXpert dataset. However, these datasets are preprocessed by classical natural language processing, which may cause a certain extent of label errors. Objective This study aimed to investigate the robustness of deep convolutional neural networks (CNNs) for binary classification of posteroanterior chest x-ray through random incorrect labeling. Methods We trained and validated the CNN architecture with different noise levels of labels in 3 datasets, namely, Asan Medical Center-Seoul National University Bundang Hospital (AMC-SNUBH), NIH, and CheXpert, and tested the models with each test set. Diseases of each chest x-ray in our dataset were confirmed by a thoracic radiologist using computed tomography (CT). Receiver operating characteristic (ROC) and area under the curve (AUC) were evaluated in each test. Randomly chosen chest x-rays of public datasets were evaluated by 3 physicians and 1 thoracic radiologist. Results In comparison with the public datasets of NIH and CheXpert, where AUCs did not significantly drop to 16%, the AUC of the AMC-SNUBH dataset significantly decreased from 2% label noise. Evaluation of the public datasets by 3 physicians and 1 thoracic radiologist showed an accuracy of 65%-80%. Conclusions The deep learning–based computer-aided diagnosis model is sensitive to label noise, and computer-aided diagnosis with inaccurate labels is not credible. Furthermore, open datasets such as NIH and CheXpert need to be distilled before being used for deep learning–based computer-aided diagnosis.

APA, Harvard, Vancouver, ISO, and other styles

46

van Otterloo, Sieuwert, and Pavlo Burda. "The Utrecht Housing dataset: A housing appraisal dataset." Computers and Society Research Journal 1 (2025): 1–11. https://doi.org/10.54822/qvhm1662.

Full text

Abstract:

This paper introduces a real-world dataset for analysing and predicting house prices. The dataset consists of actual data on the Dutch housing market collected in 2024 for a total of 153 houses in one city (Utrecht in The Netherlands). The dataset incorporates diverse variables on individual houses, includ- ing property characteristics (e.g., house type, build year, geolocation, area, energy label) and market metrics (e.g., asking price, final price). The data was collected from two public sources. The dataset has been created to help researchers and educators to demonstrate machine learning principles on several problem types. It can be used for classification (energy label and energy efficiency) and regression/ price estimation. There are ten original input features and one derived feature. The dataset can be freely used without restrictions under a Creative Commons license and is available via open data platform Kaggle.

APA, Harvard, Vancouver, ISO, and other styles

47

Lieskovský, Juraj, and Dana Lieskovská. "Cropland Abandonment in Slovakia: Analysis and Comparison of Different Data Sources." Land 10, no. 4 (2021): 334. http://dx.doi.org/10.3390/land10040334.

Full text

Abstract:

This study compares different nationwide multi-temporal spatial data sources and analyzes the cropland area, cropland abandonment rates and transformation of cropland to other land cover/land use categories in Slovakia. Four multi-temporal land cover/land use data sources were used: The Historic Land Dynamics Assessment (HILDA), the Carpathian Historical Land Use Dataset (CHLUD), CORINE Land Cover (CLC) data and Landsat images classification. We hypothesized that because of the different spatial, temporal and thematic resolution of the datasets, there would be differences in the resulting cropland abandonment rates. We validated the datasets, compared the differences, interpreted the results and combined the information from the different datasets to form an overall picture of long-term cropland abandonment in Slovakia. The cropland area increased until the Second World War, but then decreased after transition to the communist regime and sharply declined following the 1989 transition to an open market economy. A total of 49% of cropland area has been transformed to grassland, 34% to forest and 15% to urban areas. The Historical Carpathian dataset is the more reliable long-term dataset, and it records 19.65 km2/year average cropland abandonment for 1836–1937, 154.44 km2/year for 1938–1955 and 140.21 km2/year for 1956–2012. In comparison, the Landsat, as a recent data source, records 142.02 km2/year abandonment for 1985–2000 and 89.42 km2/year for 2000–2010. These rates, however, would be higher if the dataset contained urbanisation data and more precise information on afforestation. The CORINE Land Cover reflects changes larger than 5 ha, and therefore the reported cropland abandonment rates are lower.

APA, Harvard, Vancouver, ISO, and other styles

48

Fuangfoo, Patcharasiri, and Krung Sinapiromsaran. "Parameter-Free Conglomerate nearest Neighbor Classifier Using Mass-Ratio-Variance Outlier Factors." International Journal of Machine Learning 13, no. 4 (2023): 158–62. http://dx.doi.org/10.18178/ijml.2023.13.4.1145.

Full text

Abstract:

Classification is one important area in machine learning that labels the class of an instance via a classifier from known-class historical data. One of the popular classifiers is k-NN, which stands for “k-nearest neighbor” and requires a global parameter k to proceed. This global parameter may not be suitable for all instances. Naturally, each instance may situate on different regions of clusters such as an interior instance placed inside a cluster, a border instance placed on the outskirts, an outer instance placed faraway from any cluster, which requires a different number of neighbors. To automatically assign a different number of neighbors to each instance, the concept of scoring from the anomaly detection research is desired. The Mass-ratio-variance Outlier Factor, MOF, is selected as the scoring scheme for the number of neighbors of each instance. MOF gives the highest score to an instance placed very far from any cluster and the lowest score to an instance surrounded by other instances. This leads to the proposed classifier called the conglomerate nearest neighbor classifier, which does not require any parameter assigning the appropriate number of neighbors to each instance ordered by MOF. Experimental results show that this classifier exhibits similar accuracy to the k-nearest neighbor algorithm with the best k over the synthesized datasets. Six UCI datasets, the QSAR dataset, the German dataset, the Cancer dataset, the Wholesale dataset, the Haberman dataset, and the Glass3 dataset are used in the experiment. This method outperforms two UCI datasets, Wholesale and Glass3, and displays similar performance with respect to these six UCI datasets.

APA, Harvard, Vancouver, ISO, and other styles

49

Chen, He, Zhenzhong Zeng, Jie Wu, et al. "Large Uncertainty on Forest Area Change in the Early 21st Century among Widely Used Global Land Cover Datasets." Remote Sensing 12, no. 21 (2020): 3502. http://dx.doi.org/10.3390/rs12213502.

Full text

Abstract:

Forests play an important role in the Earth’s system. Understanding the states and changes in global forests is vital for ecological assessments and forest policy guidance. However, there is no consensus on how global forests have changed based on current datasets. In this study, five global land cover datasets and Global Forest Resources Assessments (FRA) were assessed to reveal uncertainties in the global forest changes in the early 21st century. These datasets displayed substantial divergences in total area, spatial distribution, latitudinal profile, and annual area change from 2001 to 2012. These datasets also display completely divergent conclusions on forest area changes for different countries. Among the datasets, total forest area changes range from an increase of 1.7 × 106 km2 to a decrease of 1.6 × 106 km2. All the datasets show deforestation in the tropics. The accuracies of the datasets in detecting forest cover changes were evaluated by a global land cover validation dataset. The spatial patterns of accuracies are inconsistent among the datasets. This study calls for the development of a more accurate database to support forest policies and to contribute to global actions against climate change.

APA, Harvard, Vancouver, ISO, and other styles

50

Li, Xiaomin, Qi Hou, Jie Zhang, Suming Zhang, Xuexue Du, and Tangqi Zhao. "Applicability Evaluation of the Global Synthetic Tropical Cyclone Hazard Dataset in Coastal China." Journal of Marine Science and Engineering 12, no. 1 (2023): 73. http://dx.doi.org/10.3390/jmse12010073.

Full text

Abstract:

A tropical cyclone dataset is an important data source for tropical cyclone disaster research, and the evaluation of its applicability is a necessary prerequisite. The Global Synthetic Tropical Cyclone Hazard (GSTCH) dataset is a dataset of global tropical cyclone activity for 10,000 years from 2018, and has become accepted as a major data source for the study of global tropical cyclone hazards. On the basis of the authoritative Tropical Cyclone Best Track (TCBT) dataset proposed by the China Meteorological Administration, this study evaluated the applicability of the GSTCH dataset in relation to two regions: the Northwest Pacific and China’s coastal provinces. For the Northwest Pacific, the results show no significant differences in the means and standard deviations of landfall wind speed, landfall pressure, and annual occurrence number between the two datasets at the 95% confidence level. They also show the cumulative distributions of central minimum pressure and central maximum wind speed along the track passed the Kolmogorov–Smirnov (K-S) test at the 95% confidence level, thereby verifying that the GSTCH dataset is consistent with the TCBT dataset at sea-area scale. For China’s coastal provinces, the results show that the means or standard deviations of tropical cyclone characteristics between the two datasets were not significantly different in provinces other than Guangdong and Hainan, and further analysis revealed that the cumulative distributions of the tropical cyclone characteristics in Guangdong and Hainan provinces passed the K-S test at the 95% confidence level, thereby verifying that the GSTCH dataset is consistent with the TCBT dataset at province scale. The applicability evaluation revealed that no significant differences exist between most of the tropical cyclone characteristics in the TCBT and GSTCH datasets, and that the GSTCH dataset is an available and reliable data source for tropical cyclone hazard studies in China’s coastal areas.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!