Log in

Relevant bibliographies by topics / CatBoost algorithm / Journal articles

To see the other types of publications on this topic, follow the link: CatBoost algorithm.

Journal articles on the topic 'CatBoost algorithm'

Author: Grafiati

Published: 5 June 2025

Last updated: 9 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'CatBoost algorithm.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Kong, Lingchao, Hongtao Liang, Guozhu Liu, and Shuo Liu. "Research on Wind Turbine Fault Detection Based on the Fusion of ASL-CatBoost and TtRSA." Sensors 23, no. 15 (2023): 6741. http://dx.doi.org/10.3390/s23156741.

Full text

Abstract:

The internal structure of wind turbines is intricate and precise, although the challenging working conditions often give rise to various operational faults. This study aims to address the limitations of traditional machine learning algorithms in wind turbine fault detection and the imbalance of positive and negative samples in the fault detection dataset. To achieve the real-time detection of wind turbine group faults and to capture wind turbine fault state information, an enhanced ASL-CatBoost algorithm is proposed. Additionally, a crawling animal search algorithm that incorporates the Tent chaotic mapping and t-distribution mutation strategy is introduced to assess the sensitivity of the ASL-CatBoost algorithm toward hyperparameters and the difficulty of manual hyperparameter setting. The effectiveness of the proposed hyperparameter optimization strategy, termed the TtRSA algorithm, is demonstrated through a comparison of traditional intelligent optimization algorithms using 11 benchmark test functions. When applied to the hyperparameter optimization of the ASL-CatBoost algorithm, the TtRSA-ASL-CatBoost algorithm exhibits notable enhancements in accuracy, recall, and other performance measures compared with the ASL-CatBoost algorithm and other ensemble learning algorithms. The experimental results affirm that the proposed algorithm model improvement strategy effectively enhances the wind turbine fault detection classification recognition rate.

APA, Harvard, Vancouver, ISO, and other styles

2

Babu, Mr M. Jeevan. "Mental Health Prediction Using Catboost Algorithm." International Journal for Research in Applied Science and Engineering Technology 12, no. 3 (2024): 3449–53. http://dx.doi.org/10.22214/ijraset.2024.59219.

Full text

Abstract:

Abstract: This study investigates the application of the CatBoost algorithm in predicting mental health outcomes using Python programming language. Mental health prediction is a critical area of research due to its significant impact on individuals and society. Traditional predictive modeling techniques often encounter challenges in handling complex and highdimensional data inherent in mental health datasets. CatBoost , a state- of-the-art gradient boosting algorithm, has shown promise in effectively addressing these challenges by handling categorical variables seamlessly and exhibiting robust performance in various domains. Leveraging its powerful capabilities, this study aims to develop predictive models for mental health outcomes utilizing a comprehensive dataset encompassing diverse socio- demographic, behavioural , and clinical factors. The predictive performance of the CatBoost algorithm will be evaluated and compared against other commonly used machine learning algorithms, demonstrating its effectiveness in accurately predicting mental health outcomes. This research contributes to the advancement of predictive modeling in mental health research and holds potential implications for personalized interventions and resource allocation in mental healthcare systems

APA, Harvard, Vancouver, ISO, and other styles

3

Luo, Mi, Yifu Wang, Yunhong Xie, et al. "Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass." Forests 12, no. 2 (2021): 216. http://dx.doi.org/10.3390/f12020216.

Full text

Abstract:

Increasing numbers of explanatory variables tend to result in information redundancy and “dimensional disaster” in the quantitative remote sensing of forest aboveground biomass (AGB). Feature selection of model factors is an effective method for improving the accuracy of AGB estimates. Machine learning algorithms are also widely used in AGB estimation, although little research has addressed the use of the categorical boosting algorithm (CatBoost) for AGB estimation. Both feature selection and regression for AGB estimation models are typically performed with the same machine learning algorithm, but there is no evidence to suggest that this is the best method. Therefore, the present study focuses on evaluating the performance of the CatBoost algorithm for AGB estimation and comparing the performance of different combinations of feature selection methods and machine learning algorithms. AGB estimation models of four forest types were developed based on Landsat OLI data using three feature selection methods (recursive feature elimination (RFE), variable selection using random forests (VSURF), and least absolute shrinkage and selection operator (LASSO)) and three machine learning algorithms (random forest regression (RFR), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost)). Feature selection had a significant influence on AGB estimation. RFE preserved the most informative features for AGB estimation and was superior to VSURF and LASSO. In addition, CatBoost improved the accuracy of the AGB estimation models compared with RFR and XGBoost. AGB estimation models using RFE for feature selection and CatBoost as the regression algorithm achieved the highest accuracy, with root mean square errors (RMSEs) of 26.54 Mg/ha for coniferous forest, 24.67 Mg/ha for broad-leaved forest, 22.62 Mg/ha for mixed forests, and 25.77 Mg/ha for all forests. The combination of RFE and CatBoost had better performance than the VSURF–RFR combination in which random forests were used for both feature selection and regression, indicating that feature selection and regression performed by a single machine learning algorithm may not always ensure optimal AGB estimation. It is promising to extending the application of new machine learning algorithms and feature selection methods to improve the accuracy of AGB estimates.

APA, Harvard, Vancouver, ISO, and other styles

4

Zhou, Fangrong, Hao Pan, Zhenyu Gao, et al. "Fire Prediction Based on CatBoost Algorithm." Mathematical Problems in Engineering 2021 (July 19, 2021): 1–9. http://dx.doi.org/10.1155/2021/1929137.

Full text

Abstract:

In recent years, increasingly severe wildfires have posed a significant threat to the safe and stable operation of transmission lines. Wildfire risk assessment and early warning have become an important research topic in power grid risk assessment. This study proposes a fire prediction model on the basis of the CatBoost algorithm to effectively predict the fire point. Five wildfire risk factors, including vegetation factors, meteorological factors, human factors, terrain factors, and land surface temperature, were combined using the feature selection method on the basis of the gradient boosting decision tree model and principal component analysis to achieve dimensionality reduction of redundant data and create a fire prediction model. The MODIS fire point product is used as the model evaluation data. The verification result uses the AUC value as the evaluation factor. The accuracy of the model is 0.82, and the AUC value is 0.83. The obtained fire point evaluation results are in good agreement with the actual fire points. Results show that this model can effectively predict the risk of wildfires.

APA, Harvard, Vancouver, ISO, and other styles

5

Irfan, Muhammad, A. Alwadie, Muhammad Awais, et al. "Motor Bearings Fault Classification using CatBoost Classifier." Renewable Energy and Power Quality Journal 20 (September 2022): 454–57. http://dx.doi.org/10.24084/repqj20.339.

Full text

Abstract:

Induction motors are used in all industries and are the major element of energy consumption. Faults in motor degrade the motor efficiency and result in more energy consumption. Bearing faults are reported to be the major reason for the motor breakdown and a lot of papers have been reported to focus on bearing fault diagnostics. However, low classification accuracy is the main hurdle in adopting the available fault classification algorithms. This paper has presented a novel classification algorithm using the Catboost classifier and timedomain features. The developed algorithm was tested on the laboratory test setup. The fault classification accuracy of 100 % was achieved through the proposed method.

APA, Harvard, Vancouver, ISO, and other styles

6

Wang, Dongming, Xing Xu, Xuewen Xia, and Heming Jia. "Interactive 3D Vase Design Based on Gradient Boosting Decision Trees." Algorithms 17, no. 9 (2024): 407. http://dx.doi.org/10.3390/a17090407.

Full text

Abstract:

Traditionally, ceramic design began with sketches on rough paper and later evolved into using CAD software for more complex designs and simulations. With technological advancements, optimization algorithms have gradually been introduced into ceramic design to enhance design efficiency and creative diversity. The use of Interactive Genetic Algorithms (IGAs) for ceramic design is a new approach, but an IGA requires a significant amount of user evaluation, which can result in user fatigue. To overcome this problem, this paper introduces the LightGBM algorithm and the CatBoost algorithm to improve the IGA because they have excellent predictive capabilities that can assist users in evaluations. The algorithms are also applied to a vase design platform for validation. First, bicubic Bézier surfaces are used for modeling, and the genetic encoding of the vase is designed with appropriate evolutionary operators selected. Second, user data from the online platform are collected to train and optimize the LightGBM and CatBoost algorithms. Finally, LightGBM and CatBoost are combined with an IGA and applied to the vase design platform to verify their effectiveness. Comparing the improved algorithm to traditional IGAs, KD trees, Random Forest, and XGBoost, it is found that IGAs improve with LightGBM, and CatBoost performs better overall, requiring fewer evaluations and less time. Its R2 is higher than other proxy models, achieving 0.816 and 0.839, respectively. The improved method proposed in this paper can effectively alleviate user fatigue and enhance the user experience in product design participation.

APA, Harvard, Vancouver, ISO, and other styles

7

Hadianto, Agus, and Wiranto Herry Utomo. "CatBoost Optimization Using Recursive Feature Elimination." Jurnal Online Informatika 9, no. 2 (2024): 169–78. http://dx.doi.org/10.15575/join.v9i2.1324.

Full text

Abstract:

CatBoost is a powerful machine learning algorithm capable of classification and regression application. There are many studies focusing on its application but are still lacking on how to enhance its performance, especially when using RFE as a feature selection. This study examines the CatBoost optimization for regression tasks by using Recursive Feature Elimination (RFE) for feature selection in combination with several regression algorithm. Furthermore, an Isolation Forest algorithm is employed at preprocessing to identify and eliminate outliers from the dataset. The experiment is conducted by comparing the CatBoost regression model's performances with and without the use of RFE feature selection. The outcomes of the experiments indicate that CatBoost with RFE, which selects features using Random Forests, performs better than the baseline model without feature selection. CatBoost-RFE outperformed the baseline with notable gains of over 48.6% in training time, 8.2% in RMSE score, and 1.3% in R2 score. Furthermore, compared to AdaBoost, Gradient Boosting, XGBoost, and artificial neural networks (ANN), it demonstrated better prediction accuracy. The CatBoost improvement has a substantial implication for predicting the exhaust temperature in a coal-fired power plant.

APA, Harvard, Vancouver, ISO, and other styles

8

Qiu, Zhaobin, Ying Qiao, Wanyuan Shi, and Xiaoqian Liu. "A robust framework for enhancing cardiovascular disease risk prediction using an optimized category boosting model." Mathematical Biosciences and Engineering 21, no. 2 (2024): 2943–69. http://dx.doi.org/10.3934/mbe.2024131.

Full text

Abstract:

<abstract> <p>Cardiovascular disease (CVD) is a leading cause of mortality worldwide, and it is of utmost importance to accurately assess the risk of cardiovascular disease for prevention and intervention purposes. In recent years, machine learning has shown significant advancements in the field of cardiovascular disease risk prediction. In this context, we propose a novel framework known as CVD-OCSCatBoost, designed for the precise prediction of cardiovascular disease risk and the assessment of various risk factors. The framework utilizes Lasso regression for feature selection and incorporates an optimized category-boosting tree (CatBoost) model. Furthermore, we propose the opposition-based learning cuckoo search (OCS) algorithm. By integrating OCS with the CatBoost model, our objective is to develop OCSCatBoost, an enhanced classifier offering improved accuracy and efficiency in predicting CVD. Extensive comparisons with popular algorithms like the particle swarm optimization (PSO) algorithm, the seagull optimization algorithm (SOA), the cuckoo search algorithm (CS), K-nearest-neighbor classification, decision tree, logistic regression, grid-search support vector machine (SVM), grid-search XGBoost, default CatBoost, and grid-search CatBoost validate the efficacy of the OCSCatBoost algorithm. The experimental results demonstrate that the OCSCatBoost model achieves superior performance compared to other models, with overall accuracy, recall, and AUC values of 73.67%, 72.17%, and 0.8024, respectively. These outcomes highlight the potential of CVD-OCSCatBoost for improving cardiovascular disease risk prediction.</p> </abstract>

APA, Harvard, Vancouver, ISO, and other styles

9

Nguyen, Thuan Minh, Hanh Hong-Phuc Vo, and Myungsik Yoo. "Enhancing Intrusion Detection in Wireless Sensor Networks Using a GSWO-CatBoost Approach." Sensors 24, no. 11 (2024): 3339. http://dx.doi.org/10.3390/s24113339.

Full text

Abstract:

Intrusion detection systems (IDSs) in wireless sensor networks (WSNs) rely heavily on effective feature selection (FS) for enhanced efficacy. This study proposes a novel approach called Genetic Sacrificial Whale Optimization (GSWO) to address the limitations of conventional methods. GSWO combines a genetic algorithm (GA) and whale optimization algorithms (WOA) modified by applying a new three-population division strategy with a proposed conditional inherited choice (CIC) to overcome premature convergence in WOA. The proposed approach achieves a balance between exploration and exploitation and enhances global search abilities. Additionally, the CatBoost model is employed for classification, effectively handling categorical data with complex patterns. A new technique for fine-tuning CatBoost’s hyperparameters is introduced, using effective quantization and the GSWO strategy. Extensive experimentation on various datasets demonstrates the superiority of GSWO-CatBoost, achieving higher accuracy rates on the WSN-DS, WSNBFSF, NSL-KDD, and CICIDS2017 datasets than the existing approaches. The comprehensive evaluations highlight the real-time applicability and accuracy of the proposed method across diverse data sources, including specialized WSN datasets and established benchmarks. Specifically, our GSWO-CatBoost method has an inference time nearly 100 times faster than deep learning methods while achieving high accuracy rates of 99.65%, 99.99%, 99.76%, and 99.74% for WSN-DS, WSNBFSF, NSL-KDD, and CICIDS2017, respectively.

APA, Harvard, Vancouver, ISO, and other styles

10

Liu, Kuirong, Guanlin Wang, Dajun Mao, and Junqing Huang. "A Hybrid Fault Early-Warning Method Based on Improved Bees Algorithm-Optimized Categorical Boosting and Kernel Density Estimation." Processes 13, no. 5 (2025): 1460. https://doi.org/10.3390/pr13051460.

Full text

Abstract:

In the context of intelligent manufacturing, equipment fault early-warning technology has become a critical support for ensuring the continuity and safety of industrial production. However, with the increasing complexity of modern industrial equipment structures and the growing coupling of operational states, traditional fault warning models face significant challenges in feature recognition accuracy and adaptability. To address these issues, this study proposes a hybrid fault early-warning framework that integrates an improved bees algorithm (IBA) with a categorical boosting (CatBoost) model and kernel density estimation (KDE). The proposed framework first develops the IBA by integrating Latin Hypercube Sampling, a multi-perturbation neighborhood search strategy, and a dynamic scout bee adjustment strategy, which effectively overcomes the conventional bees algorithm (BA)’s tendency to fall into local optima. The IBA is then employed to achieve global optimization of CatBoost’s key hyperparameters. The optimized CatBoost model is subsequently used to predict equipment operational data. Finally, the KDE method is applied to the prediction residuals to determine fault thresholds. An empirical study on a deflection fault in the valve position sensor connecting rod of the mineral oil system in a gas compressor station shows that the proposed method can issue early-warning signals two hours in advance and outperforms existing advanced algorithms in key indicators such as root mean square error (RMSE), coefficient of determination (R2) and mean absolute percentage error (MAPE). Furthermore, ablation experiments verify the effectiveness of the strategies in IBA and their contribution to CatBoost hyperparameter optimization. The proposed method significantly improves the accuracy and reliability of fault prediction in complex industrial environments.

APA, Harvard, Vancouver, ISO, and other styles

11

Zhang, Yu, Qingrui Chang, Yi Chen, Yanfu Liu, Danyao Jiang, and Zijuan Zhang. "Hyperspectral Estimation of Chlorophyll Content in Apple Tree Leaf Based on Feature Band Selection and the CatBoost Model." Agronomy 13, no. 8 (2023): 2075. http://dx.doi.org/10.3390/agronomy13082075.

Full text

Abstract:

Leaf chlorophyll content (LCC) is a crucial indicator of nutrition in apple trees and can be applied to assess their growth status. Hyperspectral data can provide an important means for detecting the LCC in apple trees. In this study, hyperspectral data and the measured LCC were obtained. The original spectrum (OR) was pretreated using some spectral transformations. Feature bands were selected based on the competitive adaptive reweighted sampling (CARS) algorithm, random frog (RF) algorithm, elastic net (EN) algorithm, and the EN-RF and EN-CARS algorithms. Partial least squares regression (PLSR), random forest regression (RFR), and the CatBoost algorithm were used before and after grid search parameter optimization to estimate the LCC. The results revealed the following: (1) The spectrum after second derivative (SD) transformation had the highest correlation with LCC (–0.929); moreover, the SD-based model produced the highest accuracy, making SD an effective spectrum pretreatment method for apple tree LCC estimation. (2) Compared with the single band selection algorithm, the EN-RF algorithm had a better dimension reduction effect, and the modeling accuracy was generally higher. (3) CatBoost after grid search optimization had the best estimation effect, and the validation set of the SD-EN-CARS-CatBoost model after parameter optimization had the highest estimation accuracy, with the determination coefficient (R2), root mean square error (RMSE), and relative prediction deviation (RPD) reaching 0.923, 2.472, and 3.64, respectively. As such, the optimized SD-EN-CARS-CatBoost model, with its high accuracy and reliability, can be used to monitor the growth of apple trees, support the intelligent management of apple orchards, and facilitate the economic development of the fruit industry.

APA, Harvard, Vancouver, ISO, and other styles

12

Zhuang, Zibo, Haosen Li, Jingyuan Shao, Pak-Wai Chan, and Hongda Tai. "Identification of Airline Turbulence Using WOA-CatBoost Algorithm in Airborne Quick Access Record (QAR) Data." Applied Sciences 14, no. 11 (2024): 4419. http://dx.doi.org/10.3390/app14114419.

Full text

Abstract:

Turbulence is a significant operational aviation safety hazard during all phases of flight. There is an urgent need for a method of airline turbulence identification in aviation systems to avoid turbulence hazards to aircraft during flight. Integrating flight data and machine learning significantly enhances the efficacy of turbulence identification. Nevertheless, present studies encounter issues including unstable model performance, challenges in data feature extraction, and parameter optimization. Hence, it is imperative to propose a superior approach to enhance the accuracy of turbulence identification along airline. The paper presents a combined swarm intelligence and machine learning model based on data mining for identifying airline turbulence. Based on the theory of swarm-intelligence-based optimization algorithm, the optimal parameters of Categorical Boosting (CatBoost) are obtained by introducing the whale optimization algorithm (WOA), and the corresponding WOA-CatBoost fusion model is established. Then, the Recursive Feature Elimination algorithm (RFE) is used to eliminate the data with lower feature weights, extract the effective features of the data, and the combination with the WOA brings robust optimization effects, whereby the accuracy of CatBoost increased by 11%. The WOA-CatBoost model can perform accurate turbulence identification from QAR data, comparable to that with established EDR approaches and outperforms traditional machine learning models. This discovery highlights the effectiveness of combining swarm intelligence and machine learning algorithms in turbulence monitoring systems to improve aviation safety.

APA, Harvard, Vancouver, ISO, and other styles

13

Khamis, Gamal Saad Mohamed, Zakariya M. S. Mohammed, Sultan Munadi Alanazi, Ashraf F. A. Mahmoud, Faroug A. Abdalla, and Sana Abdelaziz Bkheet. "Prediction of Myocardial Infarction Complications using Gradient Boosting." Engineering, Technology & Applied Science Research 14, no. 6 (2024): 18550–56. https://doi.org/10.48084/etasr.9076.

Full text

Abstract:

Cardiovascular diseases (CVDs) are the leading cause of death worldwide, representing a significant public health challenge. Myocardial Infarction (MI), a severe manifestation of CVDs, contributes substantially to these fatalities. Machine learning holds great promise for predicting MI. This study explores the potential of Gradient Boosting (GB) techniques for this purpose, explicitly focusing on CatBoost, LightGBM, XGBoost, and XGBoost Random Forest. The study leverages GB's embedded feature selection, missing-value handling, and hyperparameter tuning capabilities. Performance was evaluated using multiple metrics: Area Under the Curve (AUC), classification accuracy, F1 score, precision, recall, and Matthews Correlation Coefficient (MCC). A probabilistic comparison matrix was used to assess the relative performance of the GB models. The results demonstrate the superiority of CatBoost, achieving a classification accuracy of 94.9%, an AUC of 0.992, a recall of 94.9%, and an MCC of 0.82. The probabilistic comparison further confirms CatBoost's superior performance. These findings contribute to MI prediction, highlighting the predictive potential of the CatBoost algorithm and ultimately aiding the fight against MI to achieve better patient outcomes.

APA, Harvard, Vancouver, ISO, and other styles

14

Angga Maulana Akbar, Rudy Herteno, Setyo Wahyu Saputro, Mohammad Reza Faisal, and Radityo Adi Nugroho. "Optimizing Software Defect Prediction Models: Integrating Hybrid Grey Wolf and Particle Swarm Optimization for Enhanced Feature Selection with Popular Gradient Boosting Algorithm." Journal of Electronics, Electromedical Engineering, and Medical Informatics 6, no. 2 (2024): 169–81. http://dx.doi.org/10.35882/jeeemi.v6i2.388.

Full text

Abstract:

Software defects, also referred to as software bugs, are anomalies or flaws in computer program that cause software to behave unexpectedly or produce incorrect results. These defects can manifest in various forms, including coding errors, design flaws, and logic mistakes, this defect have the potential to emerge at any stage of the software development lifecycle. Traditional prediction models usually have lower prediction performance. To address this issue, this paper proposes a novel prediction model using Hybrid Grey Wolf Optimizer and Particle Swarm Optimization (HGWOPSO). This research aims to determine whether the Hybrid Grey Wolf and Particle Swarm Optimization model could potentially improve the effectiveness of software defect prediction compared to base PSO and GWO algorithms without hybridization. Furthermore, this study aims to determine the effectiveness of different Gradient Boosting Algorithm classification algorithms when combined with HGWOPSO feature selection in predicting software defects. The study utilizes 13 NASA MDP dataset. These dataset are divided into testing and training data using 10-fold cross-validation. After data is divided, SMOTE technique is employed in training data. This technique generates synthetic samples to balance the dataset, ensuring better performance of the predictive model. Subsequently feature selection is conducted using HGWOPSO Algorithm. Each subset of the NASA MDP dataset will be processed by three boosting classification algorithms namely XGBoost, LightGBM, and CatBoost. Performance evaluation is based on the Area under the ROC Curve (AUC) value. Average AUC values yielded by HGWOPSO XGBoost, HGWOPSO LightGBM, and HGWOPSO CatBoost are 0.891, 0.881, and 0.894, respectively. Results of this study indicated that utilizing the HGWOPSO algorithm improved AUC performance compared to the base GWO and PSO algorithms. Specifically, HGWOPSO CatBoost achieved the highest AUC of 0.894. This represents a 6.5% increase in AUC with a significance value of 0.00552 compared to PSO CatBoost, and a 6.3% AUC increase with a significance value of 0.00148 compared to GWO CatBoost. This study demonstrated that HGWOPSO significantly improves the performance of software defect prediction. The implication of this research is to enhance software defect prediction models by incorporating hybrid optimization techniques and combining them with gradient boosting algorithms, which can potentially identify and address defects more accurately

APA, Harvard, Vancouver, ISO, and other styles

15

Cao, Lianjun, Xiaobing He, Sheng Chen, and Luming Fang. "Assessing Forest Quality through Forest Growth Potential, an Index Based on Improved CatBoost Machine Learning." Sustainability 15, no. 11 (2023): 8888. http://dx.doi.org/10.3390/su15118888.

Full text

Abstract:

Human activities have always depended on nature, and forests are an important part of this; the determination and improvement of forest quality is therefore highly significant. Currently, domestic and foreign research on forest quality focuses on the current states of forests. We propose a new research direction based on the future states. By referencing and analyzing the forest quality standards of domestic and foreign experts and institutions, the concept and model for calculating forest growth potential were constructed. Forest growth potential is a new forest quality indicator. Based on the data of 110,000 subcompartments of forest resources from the Lin’an and Landsat8 satellites’ remote sensing data, the unit volume was predicted using three machine-learning algorithms: random gradient descent SGD, the integrated machine learning algorithm CatBoost, and deep learning CNN. The CatBoost algorithm model was improved based on Optuna; then the improved CatBoost algorithm was selected through evaluation indicators for the prediction of forest volume and finally incorporated into the calculation model for forest growth-potential value. The forest growth-potential value was calculated, and an accurate forest quality improvement scheme based on the subcompartments is preliminarily discussed. The successful calculation of forest growth potential values has a certain reference significance, providing guidance for accurately improving forest quality and forest management. The improved CatBoost calculation model is effective in the prediction of forest growth potential, and the determination coefficient R2 reaches 0.89, a value that compares favorably with those in other studies.

APA, Harvard, Vancouver, ISO, and other styles

16

Huang, Rong, Jimin Ni, Pengli Qiao, Qiwei Wang, Xiuyong Shi, and Qi Yin. "An Explainable Prediction Model for Aerodynamic Noise of an Engine Turbocharger Compressor Using an Ensemble Learning and Shapley Additive Explanations Approach." Sustainability 15, no. 18 (2023): 13405. http://dx.doi.org/10.3390/su151813405.

Full text

Abstract:

In the fields of environment and transportation, the aerodynamic noise emissions emitted from heavy-duty diesel engine turbocharger compressors are of great harm to the environment and human health, which needs to be addressed urgently. However, for the study of compressor aerodynamic noise, particularly at the full operating range, experimental or numerical simulation methods are costly or long-period, which do not match engineering requirements. To fill this gap, a method based on ensemble learning is proposed to predict aerodynamic noise. In this study, 10,773 datasets were collected to establish and normalize an aerodynamic noise dataset. Four ensemble learning algorithms (random forest, extreme gradient boosting, categorical boosting (CatBoost) and light gradient boosting machine) were applied to establish the mapping functions between the total sound pressure level (SPL) of the aerodynamic noise and the speed, mass flow rate, pressure ratio and frequency of the compressor. The results showed that, among the four models, the CatBoost model had the best prediction performance with a correlation coefficient and root mean square error of 0.984798 and 0.000628, respectively. In addition, the error between the predicted total SPL and the observed value was the smallest, at only 0.37%. Therefore, the method based on the CatBoost algorithm to predict aerodynamic noise is proposed. For different operating points of the compressor, the CatBoost model had high prediction accuracy. The noise contour cloud in the predicted MAP from the CatBoost model was better at characterizing the variation in the total SPL. The maximum and minimum total SPLs were 122.53 dB and 115.42 dB, respectively. To further interpret the model, an analysis conducted by applying the Shapley Additive Explanation algorithm showed that frequency significantly affected the SPL, while the speed, mass flow rate and pressure ratio had little effect on the SPL. Therefore, the proposed method based on the CatBoost algorithm could well predict aerodynamic noise emissions from a turbocharger compressor.

APA, Harvard, Vancouver, ISO, and other styles

17

Peng, Yan, Yue Liu, Jie Wang, and Xiao Li. "A Novel Framework for Risk Warning That Utilizes an Improved Generative Adversarial Network and Categorical Boosting." Electronics 13, no. 8 (2024): 1538. http://dx.doi.org/10.3390/electronics13081538.

Full text

Abstract:

To address the problems of inadequate training and low precision in prediction models with small-sample-size and incomplete data, a novel SALGAN-CatBoost-SSAGA framework is proposed in this paper. We utilize the standard K-nearest neighbor algorithm to interpolate missing values in incomplete data, and employ EllipticEnvelope to identify outliers. SALGAN, a generative adversarial network with a self-attention mechanism of label awareness, is utilized to generate virtual samples and increase the diversity of the training data for model training. To avoid local optima and improve the accuracy and stability of the standard CatBoost prediction model, an improved Sparrow Search Algorithm (SSA)–Genetic Algorithm (GA) combination is adopted to construct an effective CatBoost-SSAGA model for risk warning, in which the SSAGA is used for the global parameter optimization of CatBoost. A UCI heart disease dataset is used for heart disease risk prediction. The experimental results show the superiority of the proposed model in terms of accuracy, precision, recall, and F1-values, as well as the AUC.

APA, Harvard, Vancouver, ISO, and other styles

18

AJEWOLE, Titus O., Ganiyat O. ADEYEMO, Olatunde OLADEPO, Abdulsemiu A. OLAWUYI, and Kabiru A. HASSAN. "ENHANCING ENERGY SECURITY: DEVELOPMENT OF ADVANCED AI-BASED ENERGY THEFT DETECTION MODELS USING SMART METER DATA." OAUSTECH Journal of Engineering and Intelligent Technology 1, no. 1 (2025): 178–85. https://doi.org/10.36108/ojeit/5202.10.0191.

Full text

Abstract:

This study proposes an advanced energy theft detection system using machine learning models, namely Firefly Algorithm-CatBoost (FA-CatBoost) and Genetic Algorithm-CatBoost (GA-CatBoost), to accurately identify fraudulent energy consumption patterns in power grids. Daily power consumption dataset was employed to train and evaluate the models. The Models were evaluated using metrics performance metrics like accuracy, precision, recall, F1-score, Matthews Correlation Coefficient (MCC), and Receiver Operating Characteristic Area Under the Curve (ROC-AUC). The FA-CatBoost model demonstrated exceptional performance, achieving an accuracy of 97.6%, precision of 97.3%, recall of 98.0%, F1-score of 97.7%, MCC of 95.3%, and ROC-AUC of 97.6% during training. The GA-CatBoost model also exhibited promising results with an accuracy of 84.5%, precision of 86.0%, recall of 82.4%, F1-score of 84.2%, MCC of 69.1%, and ROC-AUC of 84.5% during training. The proposed energy theft detection system showcases the effectiveness of advanced machine learning techniques in identifying fraudulent activities within power grids, contributing to the enhancement of energy security and grid resilience. The findings suggest that power utility companies should consider integrating these validated models into their fraud detection systems and invest in data analytics infrastructure to combat energy theft effectively.

APA, Harvard, Vancouver, ISO, and other styles

19

Lapina, Maria Anatolyevna, Vitaliya Valentinovna Movzalevskaya, Marina Evgenievna Tokmakova, Mikhail Grigorievich Babenko, and Viktor Pavlovich Kochin. "Intelligent Algorithms for Detecting Attacks in the Web Environment." Proceedings of the Institute for System Programming of the RAS 36, no. 4 (2024): 99–116. http://dx.doi.org/10.15514/ispras-2024-36(4)-8.

Full text

Abstract:

The article is devoted to the analysis of the use of machine learning algorithms to detect attacks using a custom web environment or the functionality of user applications. Learning with a teacher and clustering algorithms are considered. The dataset uses a sample of online shopping transactions collected by an e-commerce retailer. The dataset contains 39,221 transactions. To detect attacks in the web environment, the most optimal implementations of machine learning algorithms were selected after their review and comparative analysis. The most effective algorithm for detecting fraudulent transactions has been determined. We use the accuracy and running time of the algorithm as criteria. The accuracy of detecting fraudulent transactions for Random Forest, GB (Scikit-learn), GB (CatBoost) algorithms is 100%, and the KD-trees algorithm is 99,9%. The gradient boosting algorithm in the CatBoos implementation is 4,2 times faster than Random Forest, 2,4 times faster than GB Scikit-learn, 1,2 times faster than GB without using the cat_features parameter, 41,9 times faster than k-dimensional trees, 66,8 times faster than DBSCAN. The data obtained for each method is presented in the form of tables. Within the framework of this work, the parameters for evaluating the effectiveness of the algorithms under study are learning time indicators, as well as characteristics from the Confusion matrix and Classification Report for classification algorithms, and fowlkes_mallows_score, rand_score, adjusted_rand_score, Homogeneity, Completeness, V-measure for clustering algorithms.

APA, Harvard, Vancouver, ISO, and other styles

20

Xing, Fang, Hui Li, and Tianyu Li. "Deformation Modeling and Prediction of Concrete Dam Using Observed Air Temperature and Enhanced CatBoost Algorithm." Water 16, no. 23 (2024): 3341. http://dx.doi.org/10.3390/w16233341.

Full text

Abstract:

Accurate prediction of concrete dam deformation is essential for ensuring structural safety and operational efficiency. This study presents a novel approach for monitoring and predicting concrete dam deformation using observed air temperature data, intelligent optimization, and machine learning techniques. To address the limitations of traditional statistical models in simulating the thermal effects on dam body deformation, this study proposes an improved hydraulic–air temperature–time (HTairT) deformation monitoring model. This model leverages long-term air temperature data and its lagged terms as critical input variables, enabling a more comprehensive understanding of thermal impacts on dam deformation. To capture the complex, nonlinear relationships between environmental factors and dam deformation behavior, we introduce the high-performance CatBoost gradient-boosting algorithm as a regressor. An enhanced Particle Swarm Optimization (PSO) algorithm is utilized for optimizing CatBoost’s parameters, enhancing the model’s predictive accuracy. A high concrete dam, currently in service, is selected as the case study, where two representative deformation monitoring points are used for validation. This research fills a gap by combining CatBoost with an optimized PSO in a deformation monitoring model, providing a novel approach that improves predictive reliability in long-term dam safety monitoring. Experimental results show that the enhanced PSO-optimized CatBoost algorithm achieves higher R2 and lower MSE and MAE values in multiple monitoring points. compared with other benchmark methods Moreover, the importance of factors affecting deformation can be identified using the proposed method, and experimental results indicate that water level and average air temperature of 1–2 days, 3–7 days, and 30–60 days are key factors affecting the deformation of high concrete arch dams.

APA, Harvard, Vancouver, ISO, and other styles

21

Chen, Xuan. "Intelligent Employment Forecasting Model Based On Catboost Algorithm." Procedia Computer Science 262 (2025): 596–603. https://doi.org/10.1016/j.procs.2025.05.090.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Gonçalves Freitas, Lucas José, Pamella Sada Dias Edokawa, Thaís Carvalho Valadares Rodrigues, Ariane Hayana Thomé de Farias, and Euler Rodrigues de Alencar. "Catboost Algorithm Application in Legal Texts and UN 2030 Agenda." Revista de Informática Teórica e Aplicada 30, no. 2 (2023): 51–58. https://doi.org/10.22456/2175-2745.128836.

Full text

Abstract:

This article evaluates the application of the Catboost algorithm for automatic classification of legal texts in The United Nations (UN) 2030 Agenda for Sustainable Development Goals (SDGs). The task consists of labeling texts from initial petitions and rulings based on identifying topics related to the objectives of the 2030 Agenda, which include sustainable development, quality education, gender equality, preservation of the environment, among other topics of interest to UN member countries. This work aims to help Judicial System employees in case management task, an activity that is manual and repetitive. Since the Catboost algorithm allows joining textual, numerical and categorical features in the same classification model. The proposed approach adds to the classification algorithm traditional metadata about legal processes, such as the Supreme Court Class and Field of Law. The main contributions of this work are: analysis of metadata in machine learning flows and evaluation of the Catboost algorithm for textual classification in legal contexts.

APA, Harvard, Vancouver, ISO, and other styles

23

Aflaha, Rahmina Ulfah, Rudy Herteno, Mohammad Reza Faisal, Friska Abadi, and Setyo Wahyu Saputro. "Effect of SMOTE Variants on Software Defect Prediction Classification Based on Boosting Algorithm." Jurnal Ilmiah Teknik Elektro Komputer dan Informatika 10, no. 2 (2024): 201–16. https://doi.org/10.26555/jiteki.v10i2.28521.

Full text

Abstract:

Detecting software defects early on is critical for avoiding significant financial losses. However, building accurate software defect prediction models can be challenging due to class imbalance, where the data for defective modules is much less than for standard modules. This research addresses this issue using the imbalanced dataset NASA MDP. To address this issue, researchers have proposed new methods that combine data level balancing approaches with 14 variations of the SMOTE algorithm to increase the amount of defective module data. An algorithm-level approach with three boosting algorithms, Catboost, LightGBM, and Gradient Boosting, is applied to classify modules as defective or non-defective. These methods aim to improve the accuracy of software defect prediction. The results show that this new method can produce a more accurate classification than previous studies. The DSMOTE and Gradient Boosting pair with 0.9161 has the highest average accuracy (0.9161). The DSMOTE and Catboost model achieved the highest average AUC value (0.9637). The ADASYN kernel and Catboost showed the best ability to perform the average G-mean value (0.9154). The research contribution to software defect prediction involves developing new techniques and evaluating their effectiveness in addressing class imbalance.

APA, Harvard, Vancouver, ISO, and other styles

24

Zou, Chengke. "The House Price Prediction Using Machine Learning Algorithm: The Case of Jinan, China." Highlights in Science, Engineering and Technology 39 (April 1, 2023): 327–33. http://dx.doi.org/10.54097/hset.v39i.6549.

Full text

Abstract:

House prices increase substantially in China from 1998. Because of expensive house prices, most Chinese people have only one chance to select suitable houses. Therefore, building a house price prediction model based on housing conditions is significant for customers to make decisions. This paper collects the estate market data of Jinan city from the HomeLink website and performs several feature selection algorithms to get critical features for house price prediction. The paper compares the classical machine learning methods for the problem, including Multiple Linear Regression, Random Forest, and Catboost. After cross-validation tests, the CatBoost, algorithm with the lowest Mean Square Error (MSE) is regarded as the most accurate algorithm to predict house prices. The analytic results show that the house price is dominated by the location features such as area and block.

APA, Harvard, Vancouver, ISO, and other styles

25

Ptr, Agus Fahmi Limas, Muhammad Mizan Siregar, and Irwan Daniel. "Analysis of Gradient Boosting, XGBoost, and CatBoost on Mobile Phone Classification." Journal of Computer Networks, Architecture and High Performance Computing 6, no. 2 (2024): 661–70. http://dx.doi.org/10.47709/cnahpc.v6i2.3790.

Full text

Abstract:

In the ever-evolving landscape of mobile phone technology, accurately classifying device specifications is paramount for market analysis and consumer decision-making. This research conducts a comprehensive analysis of mobile phone specification classification using three prominent machine learning algorithms: Gradient Boosting, XGBoost, and CatBoost. Through meticulous dataset acquisition and preprocessing steps, including resolution normalization and price categorization, features essential for classification analysis were standardized. Robust cross-validation techniques were employed to assess model performance effectively. The study demonstrates the significant impact of normalization techniques on improving model performance across all algorithms and fold variations. CatBoost consistently emerges as the top-performing algorithm, followed closely by XGBoost, with Gradient Boosting displaying respectable performance. Notably, CatBoost consistently achieves the highest AUC values and accuracy scores, demonstrating superior performance in accurately classifying mobile phone specifications. These findings underscore the importance of preprocessing methods and algorithm selection in achieving optimal classification results. For mobile phone manufacturers, leveraging machine learning algorithms for effective classification can inform product development strategies, optimizing offerings based on consumer preferences. Similarly, for data analysts, employing appropriate preprocessing techniques and algorithmic approaches can lead to more accurate predictions and informed decision-making. Future research avenues include exploring advanced preprocessing methods, investigating alternative algorithms, and incorporating additional features or datasets to enrich the classification process. Overall, this research contributes to understanding mobile phone specification classification through machine learning methodologies, offering actionable insights for industry practitioners and researchers to address evolving market dynamics and consumer preferences.

APA, Harvard, Vancouver, ISO, and other styles

26

Anudeep, Rayini, and S. John Justin Thangaraj. "Accurate Prediction of Myocardial Infarction By Comparing Logistic Regression Algorithm with CatBoost Classifier." E3S Web of Conferences 399 (2023): 04019. http://dx.doi.org/10.1051/e3sconf/202339904019.

Full text

Abstract:

Aim: The forecast of Myocardial Infarction for humans employing a Machine learning model by corresponding a Logistic Regression Algorithm with a CatBoost Classifier. The accuracy is enhanced by utilizing the novel LR Classifier. Materials and Methods: The study utilized a total of 20 sample iterations, with 10 samples per group. Group 1 was analyzed using a logistic regression algorithm, while Group 2 was analyzed using a decision tree classifier. The statistical power was set at 80%, and the confidence level was set at 95%. Results: The accuracy of the outcome with logistic regression is 94.61% and CatBoost Classifier is 79.516%, both the groups are statistically significant as p = 0.015 (<0.05) is the significant value in the independent sample T-test between LR and CB Classifier. Conclusion: This research concludes that the logistic regression algorithm gives the most accurate mortality with the difference of 15.1%, compared to the CatBoost Classifier.

APA, Harvard, Vancouver, ISO, and other styles

27

Mironenko, Yaroslav V., and Alexey D. Kurzanov. "ASSESSMENT OF THE ELECTRICAL EQUIPMENT INSULATION STATE USING THE GRADIENT BOOSTING ALGORITHM." Vestnik Chuvashskogo universiteta, no. 3 (September 29, 2021): 94–102. http://dx.doi.org/10.47026/1810-1909-2021-3-94-102.

Full text

Abstract:

The creation of analytical software products aimed at assessing the electrical equipment state has become a priority in the development of diagnostics in the power industry. The artificial intelligence methods are useful for this problem-solving. In the article, we propose a method for analyzing the monitoring data of partial discharges in the insulation of electrical equipment using machine-learning technologies. An analytical assessment of the partial discharges characteristics allows us to conclude on the insulation state of the object. It is proposed to use integrated diagnostic parameters, such as partial discharges intensity – the maximum measured value of the apparent charge of a single, repetitive and regular partial discharges. The total sample is characterized by an imbalance, which is typical for technical diagnostics in general. Among machine learning algorithms, bagging and boosting have proven to be the most effective. The mathematical apparatus of gradient boosting is considered in the example of the most common algorithms GBM (Gradient Boosting Machine) and CatBoost. The model was created in the Python programming language. The model created on the basis of the CatBoost algorithm was used for assessing the condition of the oil insulation of power transformers. The model’s accuracy of 68.85% was achieved after optimizing the parameters of the CatBoost algorithm. The article concluded that it is necessary to increase the training sample size and improve its balance. It is inadvisable to interpret the predicted data in the field of diagnostics parameters at the available accuracy of the model’s wok.

APA, Harvard, Vancouver, ISO, and other styles

28

He, Yuxiang, Baisong Yang, and Chiawei Chu. "GA-CatBoost-Weight Algorithm for Predicting Casualties in Terrorist Attacks: Addressing Data Imbalance and Enhancing Performance." Mathematics 12, no. 6 (2024): 818. http://dx.doi.org/10.3390/math12060818.

Full text

Abstract:

Terrorism poses a significant threat to international peace and stability. The ability to predict potential casualties resulting from terrorist attacks, based on specific attack characteristics, is vital for protecting the safety of innocent civilians. However, conventional data sampling methods struggle to effectively address the challenge of data imbalance in textual features. To tackle this issue, we introduce a novel algorithm, GA-CatBoost-Weight, designed for predicting whether terrorist attacks will lead to casualties among innocent civilians. Our approach begins with feature selection using the RF-RFE method, followed by leveraging the CatBoost algorithm to handle diverse modal features comprehensively and to mitigate data imbalance. Additionally, we employ Genetic Algorithm (GA) to finetune hyperparameters. Experimental validation has demonstrated the superior performance of our method, achieving a sensitivity of 92.68% and an F1 score of 90.99% with fewer iterations. To the best of our knowledge, our study is the pioneering research that applies CatBoost to address the prediction of terrorist attack outcomes.

APA, Harvard, Vancouver, ISO, and other styles

29

Uribeetxebarria, Asier, Ander Castellón, and Ana Aizpurua. "Optimizing Wheat Yield Prediction Integrating Data from Sentinel-1 and Sentinel-2 with CatBoost Algorithm." Remote Sensing 15, no. 6 (2023): 1640. http://dx.doi.org/10.3390/rs15061640.

Full text

Abstract:

Accurately estimating wheat yield is crucial for informed decision making in precision agriculture (PA) and improving crop management. In recent years, optical satellite-derived vegetation indices (Vis), such as Sentinel-2 (S2), have become widely used, but the availability of images depends on the weather conditions. For its part, Sentinel-1 (S1) backscatter data are less used in agriculture due to its complicated interpretation and processing, but is not impacted by weather. This study investigates the potential benefits of combining S1 and S2 data and evaluates the performance of the categorical boosting (CatBoost) algorithm in crop yield estimation. The study was conducted utilizing dense yield data from a yield monitor, obtained from 39 wheat (Triticum spp. L.) fields. The study analyzed three S2 images corresponding to different crop growth stages (GS) GS30, GS39-49, and GS69-75, and 13 Vis commonly used for wheat yield estimation were calculated for each image. In addition, three S1 images that were temporally close to the S2 images were acquired, and the vertical-vertical (VV) and vertical-horizontal (VH) backscatter were calculated. The performance of the CatBoost algorithm was compared to that of multiple linear regression (MLR), support vector machine (SVM), and random forest (RF) algorithms in crop yield estimation. The results showed that the combination of S1 and S2 data with the CatBoost algorithm produced a yield prediction with a root mean squared error (RMSE) of 0.24 t ha−1, a relative RMSE (rRMSE) 3.46% and an R2 of 0.95. The result indicates a decrease of 30% in RMSE when compared to using S2 alone. However, when this algorithm was used to estimate the yield of a whole plot, leveraging information from the surrounding plots, the mean absolute error (MAE) was 0.31 t ha−1 which means a mean error of 4.38%. Accurate wheat yield estimation with a spatial resolution of 10 m becomes feasible when utilizing satellite data combined with CatBoost.

APA, Harvard, Vancouver, ISO, and other styles

30

Liu, Yang, Tianxing Yang, Liwei Tian, Bincheng Huang, Jiaming Yang, and Zihan Zeng. "Ada-XG-CatBoost: A Combined Forecasting Model for Gross Ecosystem Product (GEP) Prediction." Sustainability 16, no. 16 (2024): 7203. http://dx.doi.org/10.3390/su16167203.

Full text

Abstract:

The degradation of the ecosystem and the loss of natural capital have seriously threatened the sustainable development of human society and economy. Currently, most research on Gross Ecosystem Product (GEP) is based on statistical modeling methods, which face challenges such as high modeling difficulty, high costs, and inaccurate quantitative methods. However, machine learning models are characterized by high efficiency, fewer parameters, and higher accuracy. Despite these advantages, their application in GEP research is not widespread, particularly in the area of combined machine learning models. This paper includes both a GEP combination model and an explanatory analysis model. This paper is the first to propose a combined GEP prediction model called Ada-XGBoost-CatBoost (Ada-XG-CatBoost), which integrates the Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost) algorithms, and SHapley Additive exPlanations (SHAP) model. This approach overcomes the limitations of single-model evaluations and aims to address the current issues of inaccurate and incomplete GEP assessments. It provides new guidance and methods for enhancing the value of ecosystem services and achieving regional sustainable development. Based on the actual ecological data of a national city, data preprocessing and feature correlation analysis are carried out using XGBoost and CatBoost algorithms, AdaGrad optimization algorithm, and the Bayesian hyperparameter optimization method. By selecting the 11 factors that predominantly influence GEP, training the model using these selected feature datasets, and optimizing the Bayesian parameters, the error gradient is then updated to adjust the weights, achieving a combination model that minimizes errors. This approach reduces the risk of overfitting in individual models and enhances the predictive accuracy and interpretability of the model. The results indicate that the mean squared error (MSE) of the Ada-XG-CatBoost model is reduced by 65% and 70% compared to the XGBoost and CatBoost, respectively. Additionally, the mean absolute error (MAE) is reduced by 4.1% and 42.6%, respectively. Overall, the Ada-XG-CatBoost combination model has a more accurate and stable predictive performance, providing a more accurate, efficient, and reliable reference for the sustainable development of the ecological industry.

APA, Harvard, Vancouver, ISO, and other styles

31

Chang, Wenfeng, Xiao Wang, Jing Yang, and Tao Qin. "An Improved CatBoost-Based Classification Model for Ecological Suitability of Blueberries." Sensors 23, no. 4 (2023): 1811. http://dx.doi.org/10.3390/s23041811.

Full text

Abstract:

Selecting the best planting area for blueberries is an essential issue in agriculture. To better improve the effectiveness of blueberry cultivation, a machine learning-based classification model for blueberry ecological suitability was proposed for the first time and its validation was conducted by using multi-source environmental features data in this paper. The sparrow search algorithm (SSA) was adopted to optimize the CatBoost model and classify the ecological suitability of blueberries based on the selection of data features. Firstly, the Borderline-SMOTE algorithm was used to balance the number of positive and negative samples. The Variance Inflation Factor and information gain methods were applied to filter out the factors affecting the growth of blueberries. Subsequently, the processed data were fed into the CatBoost for training, and the parameters of the CatBoost were optimized to obtain the optimal model using SSA. Finally, the SSA-CatBoost model was adopted to classify the ecological suitability of blueberries and output the suitability types. Taking a study on a blueberry plantation in Majiang County, Guizhou Province, China as an example, the findings demonstrate that the AUC value of the SSA-CatBoost-based blueberry ecological suitability model is 0.921, which is 2.68% higher than that of the CatBoost (AUC = 0.897) and is significantly higher than Logistic Regression (AUC = 0.855), Support Vector Machine (AUC = 0.864), and Random Forest (AUC = 0.875). Furthermore, the ecological suitability of blueberries in Majiang County is mapped according to the classification results of different models. When comparing the actual blueberry cultivation situation in Majiang County, the classification results of the SSA-CatBoost model proposed in this paper matches best with the real blueberry cultivation situation in Majiang County, which is of a high reference value for the selection of blueberry cultivation sites.

APA, Harvard, Vancouver, ISO, and other styles

32

Phoeuk, Menghay, and Minho Kwon. "Accuracy Prediction of Compressive Strength of Concrete Incorporating Recycled Aggregate Using Ensemble Learning Algorithms: Multinational Dataset." Advances in Civil Engineering 2023 (May 17, 2023): 1–23. http://dx.doi.org/10.1155/2023/5076429.

Full text

Abstract:

The use of alternative materials and recycling in construction has gained popularity in recent years as part of the industry’s commitment to sustainability. One such material, recycled aggregates, has been extensively studied over the past two decades for its potential to replace natural aggregates in cement-based composites. However, the unique properties of recycled aggregates make traditional concrete mix design methods ineffective in determining their target compressive strength. To address this challenge, four machine learning models based on ensemble learning algorithms, including CatBoost regressor (CatBoost), light gradient-boosting machine regressor (LGBM), random forest regressor (RFR), and extreme gradient-boosting regressor (XGBoost), were employed to predict the compressive strength of recycled aggregate concrete. Results demonstrate that the proposed models are highly accurate and generalizable, with high coefficients of determination and low error predictions. The CatBoost model performed the best, exhibiting an R2 of 0.938 and low mean absolute error and root mean squared error values of 2.639 and 3.885, respectively, in the blind evaluation process. Although the random forest regression algorithm performed the least well among the four models, it still outperformed conventional machine learning algorithms such as support vector machines and artificial neural networks. The findings in this study suggested that the CatBoost model is the optimal choice for predicting concrete’s compressive strength due to its high accuracy and low prediction error.

APA, Harvard, Vancouver, ISO, and other styles

33

Chen, Mingjie, Xincai Qiu, Weisheng Zeng, and Daoli Peng. "Combining Sample Plot Stratification and Machine Learning Algorithms to Improve Forest Aboveground Carbon Density Estimation in Northeast China Using Airborne LiDAR Data." Remote Sensing 14, no. 6 (2022): 1477. http://dx.doi.org/10.3390/rs14061477.

Full text

Abstract:

Timely, accurate estimates of forest aboveground carbon density (AGC) are essential for understanding the global carbon cycle and providing crucial reference information for climate-change-related policies. To date, airborne LiDAR has been considered as the most precise remote-sensing-based technology for forest AGC estimation, but it suffers great challenges from various uncertainty sources. Stratified estimation has the potential to reduce the uncertainty and improve the forest AGC estimation. However, the impact of stratification and how to effectively combine stratification and modeling algorithms have not been fully investigated in forest AGC estimation. In this study, we performed a comparative analysis of different stratification approaches (non-stratification, forest type stratification (FTS) and dominant species stratification (DSS)) and different modeling algorithms (stepwise regression, random forest (RF), Cubist, extreme gradient boosting (XGBoost) and categorical boosting (CatBoost)) to identify the optimal stratification approach and modeling algorithm for forest AGC estimation, using airborne LiDAR data. The analysis of variance (ANOVA) was used to quantify and determine the factors that had a significant effect on the estimation accuracy. The results revealed the superiority of stratified estimation models over the unstratified ones, with higher estimation accuracy achieved by the DSS models. Moreover, this improvement was more significant in coniferous species than broadleaf species. The ML algorithms outperformed stepwise regression and the CatBoost models based on DSS provided the highest estimation accuracy (R2 = 0.8232, RMSE = 5.2421, RRMSE = 20.5680, MAE = 4.0169 and Bias = 0.4493). The ANOVA of the prediction error indicated that the stratification method was a more important factor than the regression algorithm in forest AGC estimation. This study demonstrated the positive effect of stratification and how the combination of DSS and the CatBoost algorithm can effectively improve the estimation accuracy of forest AGC. Integrating this strategy with national forest inventory could help improve the monitoring of forest carbon stock over large areas.

APA, Harvard, Vancouver, ISO, and other styles

34

Bhargava, Kumar, and Kumar Tejaswini. "Comparative Analysis of ML based Gradient Boosting Algorithms: XGBoost, CatBoost, and LightGBM." Journal of Scientific and Engineering Research 7, no. 8 (2020): 235–39. https://doi.org/10.5281/zenodo.12666689.

Full text

Abstract:

Gradient boosting algorithms have become a vital component in the realm of machine learning, thanks to their outstanding performance in a wide range of predictive modeling tasks. This paper presents a comparative analysis of three prominent gradient boosting algorithms: XGBoost, CatBoost, and LightGBM. The study evaluates each algorithm based on its theoretical basis and implementation details, along with a comparative analysis. XGBoost excels in terms of scalability and versatility, CatBoost is distinguished by its ability to manage categorical features and prevent overfitting, and LightGBM is recognized for its efficiency and capacity to process substantial amounts of data quickly. The objective of this paper is to provide a clear understanding of the differences and similarities between these algorithms, enabling practitioners to make well-informed decisions when selecting the most appropriate tool for their specific machine learning challenges.

APA, Harvard, Vancouver, ISO, and other styles

35

Imam, Husni Al Amin, Amin Fatkhul, and Wibisono Setyawan. "Comparative Performance Analysis of Boosting Ensemble Learning Models for Optimizing Marketing Promotion Strategy Classification." Engineering and Technology Journal 10, no. 05 (2025): 4909–17. https://doi.org/10.5281/zenodo.15357117.

Full text

Abstract:

This study evaluates the performance of four boosting algorithms in ensemble learning, namely AdaBoost, Gradient Boosting, XGBoost, and CatBoost, for optimizing the classification of marketing promotion strategies. The rise of digitalization has driven the use of machine learning to understand consumer behavior better and enhance the effectiveness of promotional campaigns. Using the Marketing Promotion Campaign Uplift Modeling dataset from Kaggle, this study examines the capabilities of each algorithm in handling complex and imbalanced customer data. The evaluation metrics include accuracy, precision, recall, F1-score, and Area Under the Curve (AUC). Results indicate that XGBoost excels in precision, while Gradient Boosting achieves the highest AUC value, demonstrating superior ability in distinguishing positive and negative classes. CatBoost provides stable performance with categorical data, whereas AdaBoost shows strength in recall but is prone to false-positive predictions. Although all four algorithms exhibit good performance, the main challenge lies in addressing class imbalance. This study offers insights for marketing practitioners in selecting the most suitable algorithm and highlights the importance of data-balancing strategies to improve predictive accuracy in data-driven marketing

APA, Harvard, Vancouver, ISO, and other styles

36

Abujayyab, Sohaib K. M., Moustafa Moufid Kassem, Ashfak Ahmad Khan, et al. "Wildfire Susceptibility Mapping Using Five Boosting Machine Learning Algorithms: The Case Study of the Mediterranean Region of Turkey." Advances in Civil Engineering 2022 (December 28, 2022): 1–18. http://dx.doi.org/10.1155/2022/3959150.

Full text

Abstract:

Forest fires caused by different environmental and human factors are responsible for the extensive destruction of natural and economic resources. Modern machine learning techniques have become popular in developing very accurate and precise susceptibility maps of various natural disasters to help reduce the occurrence of such calamities. The present study has applied and tested multiple algorithms to map the areas susceptible to wildfire in the Mediterranean Region of Turkey. Besides, the performance of XGBoost, CatBoost, Gradient Boost, AdaBoost, and LightGBM methods for wildfire susceptibility mapping is also examined. The results have revealed the higher testing accuracy of CatBoost (95.47%) algorithm, followed by LightGBM (94.70%), XGBoost (88.8%), AdaBoost (86.0%), and GBM (84.48%) algorithms. Resultant wildfire susceptibility maps provide proper inventories for forest engineers, planners, and local governments for future policies regarding disaster management in Turkey.

APA, Harvard, Vancouver, ISO, and other styles

37

Ogar, Vincent Nsed, Sajjad Hussain, and Kelum A. A. Gamage. "Transmission Line Fault Classification of Multi-Dataset Using CatBoost Classifier." Signals 3, no. 3 (2022): 468–82. http://dx.doi.org/10.3390/signals3030027.

Full text

Abstract:

Transmission line fault classification forms the basis of fault protection management in power systems. Because faults have adverse effects on transmission lines, adequate measures must be implemented to avoid power outages. This paper focuses on using the categorical boosting (CatBoost) algorithm classifier to analyse and train multiple voltage and current data from a 330 kV and 500 km-long simulated faulty transmission line model designed using Matlab/Simulink. From it, 93,340 fault data sizes were extracted. The CatBoost classifier was employed to classify the faults after different machine learning algorithms were used to train the same data with different parameters. The trainer achieved the best accuracy of 99.54%, with an error of 0.46% for 748 iterations out of 1000. The algorithm was selected for its high performance in classifying faults based on accuracy, precision and speed. In addition, it is easy to use and handles multiple data-sets. In contrast, a support vector machine and an artificial neural network each has a longer training time than the proposed method’s 58.5 s. Proper fault classification techniques assist in the effective fault management and planning of power system control thereby preventing energy waste and providing high performance.

APA, Harvard, Vancouver, ISO, and other styles

38

Lin, Nan, Xunhu Ma, Ranzhe Jiang, Menghong Wu, and Wenchun Zhang. "Estimation of Maize Residue Cover Using Remote Sensing Based on Adaptive Threshold Segmentation and CatBoost Algorithm." Agriculture 14, no. 5 (2024): 711. http://dx.doi.org/10.3390/agriculture14050711.

Full text

Abstract:

Maize residue cover (MRC) is an important parameter to quantify the degree of crop residue cover in the field and its spatial distribution characteristics. It is also a key indicator of conservation tillage. Rapid and accurate estimation of maize residue cover (MRC) and spatial mapping are of great significance to increasing soil organic carbon, reducing wind and water erosion, and maintaining soil and water. Currently, the estimation of maize residue cover in large areas suffers from low modeling accuracy and poor working efficiency. Therefore, how to improve the accuracy and efficiency of maize residue cover estimation has become a research hotspot. In this study, adaptive threshold segmentation (Yen) and the CatBoost algorithm are integrated and fused to construct a residue coverage estimation method based on multispectral remote sensing images. The maize planting areas in and around Sihe Town in Jilin Province, China, were selected as typical experimental regions, and the unmanned aerial vehicle (UAV) was employed to capture maize residue cover images of sample plots within the area. The Yen algorithm was applied to calculate and analyze maize residue cover. The successive projections algorithm (SPA) was used to extract spectral feature indices from Sentinel-2A multispectral images. Subsequently, the CatBoost algorithm was used to construct a maize residue cover estimation model based on spectral feature indices, thereby plotting the spatial distribution map of maize residue cover in the experimental area. The results show that the image segmentation based on the Yen algorithm outperforms traditional segmentation methods, with the highest Dice coefficient reaching 81.71%, effectively improving the accuracy of maize residue cover recognition in sample plots. By combining the spectral index calculation with the SPA algorithm, the spectral features of the images are effectively extracted, and the spectral feature indices such as NDTI and STI are determined. These indices are significantly correlated with maize residue cover. The accuracy of the maize residue cover estimation model built using the CatBoost model surpasses that of traditional machine learning models, with a maximum determination coefficient (R2) of 0.83 in the validation set. The maize residue cover estimation model constructed based on the Yen and CatBoost algorithms effectively enhances the accuracy and reliability of estimating maize residue cover in large areas using multispectral imagery, providing accurate and reliable data support and services for precision agriculture and conservation tillage.

APA, Harvard, Vancouver, ISO, and other styles

39

Krishnan, S., S. K. Aruna, Karthick Kanagarathinam, and Ellappan Venugopal. "Identification of Dry Bean Varieties Based on Multiple Attributes Using CatBoost Machine Learning Algorithm." Scientific Programming 2023 (April 21, 2023): 1–21. http://dx.doi.org/10.1155/2023/2556066.

Full text

Abstract:

Dry beans are the most widely grown edible legume crop worldwide, with high genetic diversity. Crop production is strongly influenced by seed quality. So, seed classification is important for both marketing and production because it helps build sustainable farming systems. The major contribution of this research is to develop a multiclass classification model using machine learning (ML) algorithms to classify the seven varieties of dry beans. The balanced dataset was created using the random undersampling method to avoid classification bias of ML algorithms towards the majority group caused by the unbalanced multiclass dataset. The dataset from the UCI ML repository is utilised for developing the multiclass classification model, and the dataset includes the features of seven distinct varieties of dried beans. To address the skewness of the dataset, a Box-Cox transformation (BCT) was performed on the dataset’s attributes. The 22 ML classification algorithms have been applied to the balanced and preprocessed dataset to identify the best ML algorithm. The ML algorithm results have been validated with a 10-fold cross-validation approach, and during validation, the CatBoost ML algorithm achieved the highest overall mean accuracy of 93.8 percent, with a range of 92.05 percent to 95.35 percent.

APA, Harvard, Vancouver, ISO, and other styles

40

Wang, Dazhong, Yinghui Chang, Pengfei Ji, Yanchun Suo, and Ning Chen. "State of Charge Prediction of Mine-Used LiFePO4 Battery Based on PSO-Catboost." Energies 17, no. 23 (2024): 5920. http://dx.doi.org/10.3390/en17235920.

Full text

Abstract:

The accurate prediction of battery state of charge (SOC) is one of the critical technologies for the safe operation of a power battery. Aiming at the problem of mine power battery SOC prediction, based on the comparative experiments and analysis of particle swarm optimization (PSO) and Categorical Boosting (Catboost) characteristics, the PSO-Catboost model is proposed to predict the SOC of a power lithium iron phosphate battery. Firstly, the classification model based on Catboost is constructed, and then the particle swarm algorithm is used to optimize the Catboost hyperparameters to build the optimal model. The experiment and comparison show that the optimized model’s prediction accuracy and average precision are superior to other comparative models. Compared with the Catboost model, the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) values of the PSO-Catboost model decreased by 12.4% and 25.4% during charging and decreased by 5.5% and 12.2% during discharging. Finally, the Random Forest (RF) and Extreme Gradient Boosting (XGBoost) models, both ensemble learning models, are selected and compared with PSO-Catboost after being optimized via PSO. The experimental results show that the proposed model has a better performance. In this paper, experiments show that the optimization model can select parameters more intelligently, reduce the error caused by artificial experience to adjust parameters, and have a better theoretical value and practical significance.

APA, Harvard, Vancouver, ISO, and other styles

41

Kong, Yueping, and Ziyu Liu. "Optimization of Zinc Smelting Slag Melting Point Based on Catboost and Improved Snake Optimization Algorithm." Applied Sciences 14, no. 11 (2024): 4603. http://dx.doi.org/10.3390/app14114603.

Full text

Abstract:

The regulation of the melting point of zinc smelting slag has an important impact on the subsequent smelting processes of the metal. In actual production, uncontrollable melting points may result in inconsistent product quality, which has a great negative impact on the smelter’s efficiency and environmental protection. However, the regulation mechanism of the melting point of the smelting slag is complex, with many influencing factors, and there is no recognized high-precision calculation method. In response to these challenges, this study introduces an innovative approach for optimizing the melting point of zinc smelting slag based on the improved Snake Optimization (ISO) algorithm. The melting point of zinc smelting slag is modeled using the Catboost algorithm, and the model parameters are optimized using the Tree-structured Parzen Estimator (TPE) to improve the accuracy of the model. Next, the ISO algorithm is employed to conduct optimization calculations, determining the optimal values of various production process parameters that minimize the melting point. The effectiveness of this approach was evaluated using diverse modeling algorithms and test functions, subsequently applied to optimize and validate actual production data from a smelter in Shaanxi, China. Statistical analyses reveal that the TPE-optimized Catboost model exhibits an R2 of 93.89%, an RMSE of 7.02 °C, an MAE of 6.19 °C, and an MRE of 7.88%, surpassing performance metrics of alternative algorithms. Regarding optimization efficacy, the proposed ISO algorithm achieves an average reduction of 65 °C in the melting point and demonstrates superior robustness compared to both actual production data and alternative algorithms.

APA, Harvard, Vancouver, ISO, and other styles

42

Madduru, Soumya. "ENSEMBLE CATBOOST-BASED MICROARRAY GENE EXPRESSION RETRIEVAL SYSTEM FOR ENHANCED DISEASE CLASSIFICATION." ICTACT Journal on Soft Computing 16, no. 1 (2025): 3814–19. https://doi.org/10.21917/ijsc.2025.0529.

Full text

Abstract:

Microarray gene expression profiling is a crucial tool in identifying genetic patterns associated with complex diseases. However, high dimensionality and noise in microarray datasets pose challenges for effective gene retrieval and classification. Traditional classifiers often struggle to accurately retrieve relevant gene features and achieve robust disease classification performance due to overfitting and sensitivity to noise. This paper proposes an Enhanced Gene Retrieval System leveraging an Ensemble CatBoost Algorithm. CatBoost, a gradient boosting decision tree framework, is known for handling categorical features and avoiding prediction shift. The system integrates feature selection techniques with CatBoost to optimize gene relevance and improve classification accuracy. Pre-processing includes normalization and principal component analysis (PCA) for dimensionality reduction. The ensemble approach combines multiple CatBoost models using bagging to improve robustness and generalization. The proposed method was evaluated on benchmark microarray datasets (e.g., Leukemia, Colon, Prostate). It significantly outperformed traditional models like SVM, Random Forest, KNN, and XGBoost, achieving up to 96.2% accuracy, 94.8% precision, 95.1% recall, and 0.97 F1-score. The ensemble CatBoost model demonstrated superior stability and interpretability in gene selection and disease classification.

APA, Harvard, Vancouver, ISO, and other styles

43

Shan, Jingyi, and Linan Lin. "Construction of a CatBoost Classification Prediction Model for Municipal-Level Teacher Identity Based on Professional Experience." Communications in Humanities Research 53, no. 1 (2025): 94–100. https://doi.org/10.54254/2753-7064/2025.21770.

Full text

Abstract:

To explore the relationship between teacher identity characteristics and professional experience, this study takes municipal-level outstanding teachers in the Beijing-Tianjin-Hebei region as the research sample and constructs a teacher identity classification prediction model based on the CatBoost machine learning algorithm. Guided by the principle of "promoting the great spirit of educators," this paper extracts 14 professional experience characteristic indicators from four dimensions: educational and teaching ability, collaboration and innovation ability, research and practice ability, among others. Leveraging the advantages of the CatBoost algorithm in handling categorical features, the study calculates the importance of various identity characteristics of outstanding teachers and applies them in the training and testing of variables. In terms of innovation, the CatBoost algorithm improves upon the traditional Gradient Boosting Decision Tree (GBDT) by refining the classification effectiveness through test data evaluation, thereby ensuring precise model assessment. The research findings indicate that machine learning has broad applicability in evaluating teacher professional development and can accurately reveal the composition of teacher professional identity. This provides strong data support and scientific evidence for advancing teacher professional development, formulating targeted training strategies, and promoting the construction of a strong education system.

APA, Harvard, Vancouver, ISO, and other styles

44

Maulana, Aga, Razief Perucha Fauzie Afidh, Nur Balqis Maulydia, Ghazi Mauer Idroes, and Souvia Rahimah. "Predicting Obesity Levels with High Accuracy: Insights from a CatBoost Machine Learning Model." Infolitika Journal of Data Science 2, no. 1 (2024): 17–27. http://dx.doi.org/10.60084/ijds.v2i1.195.

Full text

Abstract:

This study aims to develop a machine learning model using the CatBoost algorithm to predict obesity based on demographic, lifestyle, and health-related features and compare its performance with other machine learning algorithms. The dataset used in this study, containing information on 2,111 individuals from Mexico, Peru, and Colombia, was used to train and evaluate the CatBoost model. The dataset included gender, age, height, weight, eating habits, physical activity levels, and family history of obesity. The model's performance was assessed using accuracy, precision, recall, and F1-score and compared to logistic regression, K-nearest neighbors (KNN), random forest, and naive Bayes algorithms. Feature importance analysis was conducted to identify the most influential factors in predicting obesity levels. The results indicate that the CatBoost model achieved the highest accuracy at 95.98%, surpassing other models. Furthermore, the CatBoost model demonstrated superior precision (96.08%), recall (95.98%), and F1-score (96.00%). The confusion matrix revealed that the model accurately predicted the majority of instances in each obesity level category. Feature importance analysis identified weight, height, and gender as the most influential factors in predicting obesity levels, followed by dietary habits, physical activity, and family history of overweight. The model's high accuracy, precision, recall, and F1-score and ability to handle categorical variables effectively make it a valuable tool for obesity risk assessment and classification. The insights gained from the feature importance analysis can guide the development of targeted obesity prevention and management strategies, focusing on modifiable risk factors such as diet and physical activity. While further validation on diverse populations is necessary, the CatBoost model's results demonstrate its potential to support clinical decision-making and inform public health initiatives in the fight against the global obesity epidemic.

APA, Harvard, Vancouver, ISO, and other styles

45

Said, Noha Mostafa Mohamed, Sabna Machinchery Ali, Naseema Shaik, Khan Mohamed Jarina Begum, Dr Anwaar Ahmed Abd elLatif Shaban, and Dr Betty Elezebeth Samuel. "Analysis of Internet of Things to Enhance Security Using Artificial Intelligence based Algorithm." Journal of Internet Services and Information Security 14, no. 4 (2024): 590–604. https://doi.org/10.58346/jisis.2024.i4.037.

Full text

Abstract:

Exploring creative methods to secure IoT networks is vital due to the enormous security concerns created by the rapid proliferation of the Internet of Things (IoT). To increase the security of the IoT, this study examines the use of artificial intelligence (AI), specifically deep learning (DL) as well as machine learning (ML) techniques. Three state-of-the-art DL algorithms—Long Short-Term Memory (LSTM), Deep Belief Networks (DBN), Convolutional Neural Networks (CNN)—along with three ML methods—CatBoost, LightGBM, and XGBoost—are examined. These algorithms are renowned for their capability to handle big, as well as unbalanced datasets. This work test how well each algorithm can identify anomalies, categorize attacks, and forecast vulnerabilities using an IoT security dataset, such as CICIDS 2017 as well as IoT-23. The research evaluates algorithms by comparing their accuracy and training time. Classification tasks are where CatBoost and LightGBM really good, but when it comes to sequential data and complicated attack patterns, DL algorithms like CNN and LSTM are good. To provide the groundwork for creating AI-driven security solutions optimised for IoT systems, this research sheds light on the benefits and drawbacks of each method.

APA, Harvard, Vancouver, ISO, and other styles

46

Ponkumar, G., S. Jayaprakash, and Karthick Kanagarathinam. "Advanced Machine Learning Techniques for Accurate Very-Short-Term Wind Power Forecasting in Wind Energy Systems Using Historical Data Analysis." Energies 16, no. 14 (2023): 5459. http://dx.doi.org/10.3390/en16145459.

Full text

Abstract:

Accurate wind power forecasting plays a crucial role in the planning of unit commitments, maintenance scheduling, and maximizing profits for power traders. Uncertainty and changes in wind speeds pose challenges to the integration of wind power into the power system. Therefore, the reliable prediction of wind power output is a complex task with significant implications for the efficient operation of electricity grids. Developing effective and precise wind power prediction systems is essential for the cost-efficient operation and maintenance of modern wind turbines. This article focuses on the development of a very-short-term forecasting model using machine learning algorithms. The forecasting model is evaluated using LightGBM, random forest, CatBoost, and XGBoost machine learning algorithms with 16 selected parameters from the wind energy system. The performance of the machine learning-based wind energy forecasting is assessed using metrics such as mean absolute error (MAE), mean-squared error (MSE), root-mean-squared error (RMSE), and R-squared. The results indicate that the random forest algorithm performs well during training, while the CatBoost algorithm demonstrates superior performance, with an RMSE of 13.84 for the test set, as determined by 10-fold cross-validation.

APA, Harvard, Vancouver, ISO, and other styles

47

Harriz, Muhammad Alfathan, Nurhaliza Vania Akbariani, Harlis Setiyowati, and Handri Santoso. "CLASSIFYING VILLAGE FUND IN WEST JAVA, INDONESIA USING CATBOOST ALGORITHM." Jurnal Indonesia : Manajemen Informatika dan Komunikasi 4, no. 2 (2023): 691–97. http://dx.doi.org/10.35870/jimik.v4i2.269.

Full text

Abstract:

With over 261 million inhabitants, Indonesia is home to approximately 15,000 villages, according to the Ministry of Villages, Disadvantaged Regions, and Transmigration. Among these, 1,406 are in West Java. Of these, 504 of them are advanced, 464 are developing, 390 are disadvantaged, and 48 are very disadvantaged. The CatBoost machine learning model was used to classify village funds in West Java from 2018 to 2021 and had an accuracy rating of 75%, precision rating of 79%, recall of 79%, and f1 score of 79%, demonstrating its excellent performance. However, missing data points had to be removed from the analysis and it is suggested that a more sophisticated method for handling missing values should be used in future studies. In addition, hyperparameter tuning could be employed to increase the model's performance, and a variety of metrics could be used to accurately assess the results. Overall, CatBoost may be of benefit to the Indonesian Government in order to classify village funds according to their status, channel funds more accurately and efficiently, and observe the situation of a village year-over-year.

APA, Harvard, Vancouver, ISO, and other styles

48

Torres, Fabricio Lozada, Sharon Álvarez Gómez, Diego Palma Rivero, Christian F. Tantaleán Odar, and Sayfuddinov Shukhrat. "Predictive Modeling Through Fusion of Passengers Information Transferred to Alternate Dimensions." Fusion: Practice and Applications 14, no. 1 (2024): 252–62. http://dx.doi.org/10.54216/fpa.140118.

Full text

Abstract:

This research focuses on the identification of passengers, in dimensions using information fusion as a tool. We recognize the challenges involved in identifying individuals who have been transferred to alternate dimensions and in this study we make use of CatBoost, an open source machine learning algorithm to address this problem. Our approach includes a preprocessing strategy that involves filling in missing values using techniques like priori distribution terms, which helps ensure the reliability of our dataset. By leveraging CatBoosts ability to handle variables and prevent overfitting we achieve results in accurately predicting passenger movement across dimensions. Our analysis highlights CatBoosts effectiveness in identifying patterns within data leading to more precise predictions for interdimensional passenger transportation. Additionally we incorporate techniques, like Greedy TS augmentation to enhance the adaptability of the algorithm and improve precision while reducing bias in modeling. Proof-of-concept experiments demonstrate that the proposed fusion system not only advances predictive modeling in niche domains but also paves the way for broader applications of machine learning in deciphering complex phenomena beyond traditional realms, marking a significant stride in understanding and addressing unconventional challenges.

APA, Harvard, Vancouver, ISO, and other styles

49

Zhang, Yuzhen, Jun Ma, Shunlin Liang, Xisheng Li, and Manyao Li. "An Evaluation of Eight Machine Learning Regression Algorithms for Forest Aboveground Biomass Estimation from Multiple Satellite Data Products." Remote Sensing 12, no. 24 (2020): 4015. http://dx.doi.org/10.3390/rs12244015.

Full text

Abstract:

This study provided a comprehensive evaluation of eight machine learning regression algorithms for forest aboveground biomass (AGB) estimation from satellite data based on leaf area index, canopy height, net primary production, and tree cover data, as well as climatic and topographical data. Some of these algorithms have not been commonly used for forest AGB estimation such as the extremely randomized trees, stochastic gradient boosting, and categorical boosting (CatBoost) regression. For each algorithm, its hyperparameters were optimized using grid search with cross-validation, and the optimal AGB model was developed using the training dataset (80%) and AGB was predicted on the test dataset (20%). Performance metrics, feature importance as well as overestimation and underestimation were considered as indicators for evaluating the performance of an algorithm. To reduce the impacts of the random training-test data split and sampling method on the performance, the above procedures were repeated 50 times for each algorithm under the random sampling, the stratified sampling, and separate modeling scenarios. The results showed that five tree-based ensemble algorithms performed better than the three nonensemble algorithms (multivariate adaptive regression splines, support vector regression, and multilayer perceptron), and the CatBoost algorithm outperformed the other algorithms for AGB estimation. Compared with the random sampling scenario, the stratified sampling scenario and separate modeling did not significantly improve the AGB estimates, but modeling AGB for each forest type separately provided stable results in terms of the contributions of the predictor variables to the AGB estimates. All the algorithms showed forest AGB were underestimated when the AGB values were larger than 210 Mg/ha and overestimated when the AGB values were less than 120 Mg/ha. This study highlighted the capability of ensemble algorithms to improve AGB estimates and the necessity of improving AGB estimates for high and low AGB levels in future studies.

APA, Harvard, Vancouver, ISO, and other styles

50

Chepurnenko, Anton, Tatiana Kondratieva, Timur Deberdeev, Vladimir Akopyan, Arthur Avakov, and Viacheslav Chepurnenko. "Prediction of rheological parameters of polymers, using gradient boosting algorithm CatBoost." All the Materials. Encyclopedic Reference Book., no. 6 (2023): 21–29. http://dx.doi.org/10.31044/1994-6260-2023-0-6-21-29.

Full text

Abstract:

The article deals with the problem of determining the rheological parameters of polymers from stress relaxation curves using the CatBoost machine learning algorithm. The model is trained on theoretical curves constructed using the non-linear Maxwell-Gurevich equation. A comparison is made with other methods, including the classical algorithm, non-linear optimization methods and artificial neural networks.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!