Log in

Relevant bibliographies by topics / XGB Regressor / Journal articles

To see the other types of publications on this topic, follow the link: XGB Regressor.

Journal articles on the topic 'XGB Regressor'

Author: Grafiati

Published: 7 June 2025

Last updated: 17 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'XGB Regressor.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Liang, Qipeng. "Mobile phone price prediction: A comparative study among four models." Applied and Computational Engineering 48, no. 1 (2024): 212–18. http://dx.doi.org/10.54254/2755-2721/48/20241516.

Full text

Abstract:

As science and technology is advancing by leaps and bounds, mobile phones have become part and parcel of people's life. Because the different models of mobile phones which have different structural foundations, the prices of mobile phones are constantly fluctuating. Mobile phone prices forecasts are becoming more precise as artificial intelligence develops. This article compares various machine learning approaches, and the importance of the variables is ranked in order to determine the most accurate way to forecast the prices of mobile phones. The machine learning techniques used are linear regression (LR), random forest regressor (RFR), XGB Regressor and Support Vector Machine regressor (SVM). In order to determine which model predicts the most accurate mobile phone prices, R^2 evaluation is used. The XGB Regressor model had the greatest score (R-squared = 0.95) for prediction of mobile phone prices, compared to the other three models. In a word, with XGB Regressor methodology as a priority for future mobile phone price predicting, which can improve the accuracy of price predicting.

APA, Harvard, Vancouver, ISO, and other styles

2

Shivraj, R., S. Vikas, Abhishek MN Naga, Kumar GN Naveen, Deepak NR Dr., and B. Ompraksash. "Prediction of Stock Market Performance Analysis by Using Machine Learning Regressor Techniques." Recent Trends in Computer Graphics and Multimedia Technology 7, no. 2 (2025): 11–21. https://doi.org/10.5281/zenodo.15331553.

Full text

Abstract:

<em>Stock market prediction is a widely researched and crucial topic for investors, traders, and financial analysts. Precisely predicting stock price fluctuations can aid in making informed decisions regarding the buying or selling of stocks. One approach to achieving this is through sentimental analysis that has emerged as a popular approach for predicting stock prices. The research employs machine learning methods to enhance the accuracy of stock market predictions. It focuses on analyzing the efficiency of five advanced machine learning regression model.</em> <em>Bagging Regressor, XGB Regressor, LGBM Regressor, Hist Gradient Boosting Regressor, and AdaBoost Regressors are widely used models in machine learning applied to regression tasks. Out of these models, the Bagging Regressor stood out by delivering the best performance, with an R-squared score of 99.9774 and a minimal RMSE of 8.305. It also proved to be computationally efficient, completing the task in just 0.1857 milliseconds. These results emphasize the dependability and effective of the Bagging method Regressor in stock market prediction in Providing meaningful insights for financial modeling and decision-making.</em> <strong><em> </em></strong>

APA, Harvard, Vancouver, ISO, and other styles

3

Al Khairi, Said, Ahmad Rio Adriansyah, and Lukman Rosyidi. "Perbandingan XGB Regressor dengan Algoritma Lain untuk Prediksi Tarif Tol." DBESTI: Journal of Digital Business and Technology Innovation 2, no. 1 (2025): 127–32. https://doi.org/10.54914/dbesti.v2i1.1477.

Full text

Abstract:

Beberapa tahun terakhir jalan tol di Indonesia telah berkembang pesat, banyak jalan tol di Indonesia dibangun guna memperlancar lalu lintas di daerah yang telah berkembang dan meningkatkan pelayanan distribusi barang dan jasa guna menunjang pertumbuhan ekonomi. Selain itu, jalan tol memainkan peran penting sebagai bagian dari upaya untuk meningkatkan konektivitas antar kota dan wilayah serta mempercepat mobilitas masyarakat. Banyak manfaat jalan tol yang sudah dirasakan masyarakat Indonesia seperti, jalan tol Jagorawi yang melancarkan lalu lintas sehingga mempersingkat waktu tempuh daerah ke daerah lain, dan masih banyak lagi. Tujuan dari penelitian ini adalah membuat machine learning prediksi tarif jalan tol guna memberi acuan kepada masyarakat, mengoptimalkan tarif tol di Indonesia, serta memberikan masukkan tarif tol sebagai pertimbangan pemerintah terkait. Pendekatan penelitian ini adalah kuantitatif menggunakan regresi linier dengan algoritma xgb regressor. Hasil pembuatan machine learning prediksi tarif tol ini cukup akurat dimana hasil uji akurasi yang menggunakan metrik root mean squared error (RMSE) berada di angka 3390.691, dengan hasil testing menunjukan adanya beberapa tarif prediksi yang sesuai dengan tarif asli.

APA, Harvard, Vancouver, ISO, and other styles

4

Khan, Tarana, Urfi Khan, Adnan Khan, Calahan Mollan, Inga Morkvenaite-Vilkonciene, and Vijitashwa Pandey. "Data-Driven Digital Twin Framework for Predictive Maintenance of Smart Manufacturing Systems." Machines 13, no. 6 (2025): 481. https://doi.org/10.3390/machines13060481.

Full text

Abstract:

A Digital twin (DT) enables the acquisition and subsequent analysis of large amounts of process data. Various machine learning (ML) algorithms exist for analysis and prediction that can be used in this scenario. However, there is very little understanding of the relative merit of these methods. This paper proposes a DT framework in the context of predictive maintenance in smart manufacturing to compare the prediction efficacy of prevalent ML models. Data-driven models were developed using machine learning algorithms to predict surface roughness and power consumption during a CNC turning operation. Three process parameters, namely cutting velocity, feed rate, and depth of cut, and two dependent parameters, surface roughness and power consumption, were selected for model development. Seven ML algorithms were tested for each response parameter: Linear Regression, XGB Regressor, Random Forest Regressor, Average Ensemble, AdaBoost Regressor, SVR, and MLP. The results of the comparative analysis of the ML algorithms showed that the Random Forest Regressor is the best prediction model for surface roughness, with the highest R2 (94.2% ± 2.4%), lowest MAE (0.011 ± 0.002), lowest MAPE (15.6% ± 4.0%), and lowest RMSE (0.017 ± 0.003), while the XGB Regressor demonstrated the best performance for power consumption prediction, with the highest R2 (98.9% ± 0.5%), lowest MAE (22.513 ± 4.424), lowest MAPE (3.0% ± 0.5%), and lowest RMSE (42.650 ± 8.933). The best-performing machine learning algorithm was subsequently utilized in the data-driven models, helping to achieve an improved surface finish. This enables predictive maintenance, reducing energy usage for more sustainable production.

APA, Harvard, Vancouver, ISO, and other styles

5

Amiri, Ahmed Faris, Aissa Chouder, Houcine Oudira, Santiago Silvestre, and Sofiane Kichou. "Improving Photovoltaic Power Prediction: Insights through Computational Modeling and Feature Selection." Energies 17, no. 13 (2024): 3078. http://dx.doi.org/10.3390/en17133078.

Full text

Abstract:

This work identifies the most effective machine learning techniques and supervised learning models to estimate power output from photovoltaic (PV) plants precisely. The performance of various regression models is analyzed by harnessing experimental data, including Random Forest regressor, Support Vector regression (SVR), Multi-layer Perceptron regressor (MLP), Linear regressor (LR), Gradient Boosting, k-Nearest Neighbors regressor (KNN), Ridge regressor (Rr), Lasso regressor (Lsr), Polynomial regressor (Plr) and XGBoost regressor (XGB). The methodology applied starts with meticulous data preprocessing steps to ensure dataset integrity. Following the preprocessing phase, which entails eliminating missing values and outliers using Isolation Feature selection based on a correlation threshold is performed to identify relevant parameters for accurate prediction in PV systems. Subsequently, Isolation Forest is employed for outlier detection, followed by model training and evaluation using key performance metrics such as Root-Mean-Squared Error (RMSE), Normalized Root-Mean-Squared Error (NRMSE), Mean Absolute Error (MAE), and R-squared (R2), Integral Absolute Error (IAE), and Standard Deviation of the Difference (SDD). Among the models evaluated, Random Forest emerges as the top performer, highlighting promising results with an RMSE of 19.413, NRMSE of 0.048%, and an R2 score of 0.968. Furthermore, the Random Forest regressor (the best-performing model) is integrated into a MATLAB application for real-time predictions, enhancing its usability and accessibility for a wide range of applications in renewable energy.

APA, Harvard, Vancouver, ISO, and other styles

6

K, Amrutha. "Regression Modeling Approaches for Red Wine Quality Prediction: Individual and Ensemble." International Journal for Research in Applied Science and Engineering Technology 11, no. 6 (2023): 3621–27. http://dx.doi.org/10.22214/ijraset.2023.54363.

Full text

Abstract:

Abstract: This paper aims to compare the performance of several regression models and a combination of regression and ensemble models in predicting the quality of red wine using the wine quality dataset from the UCI Machine Learning Repository. The dataset consists of white and red vinho verde wines from northern Portugal, with 6,497 samples. Before training the models, the dataset undergoes appropriate preprocessing steps to ensure data quality and consistency. Five re-gression algorithms, namely Linear Regression (LR), Random Forest Regressor (RF), Support Vector Regression (SVR), Decision Tree Regressor (DT), and Multi-layer Perceptron Regressor (MLP) are trained and tested on the dataset. Additionally, the predictions of these individual regression models are combined with four ensemble models: XGBRegressor (XGB), AdaBoostRegressor (ABR), BaggingRegressor (BR), and GradientBoostingRegressor (GRB). The results indicate that among the individual models, Random Forest (RF) performs the best, exhibiting the lowest MAE, MSE, and RMSE values and the highest R2 score. This suggests that RF better fits the red wine quality dataset compared to the other regression models. However, the combination of Random For-est with Bagging Regressor (RF and BR) outperforms the individual models, demonstrating lower errors and a relatively higher R2 score.

APA, Harvard, Vancouver, ISO, and other styles

7

Geyikoğlu, Ali, and Mete Yağanoğlu. "Makine Öğrenmesi Algoritmaları ile Elektrik Dağıtım Şebekeleri Arıza Tahmini." Karadeniz Fen Bilimleri Dergisi 15, no. 1 (2025): 73–98. https://doi.org/10.31466/kfbd.1482179.

Full text

Abstract:

Elektrik dağıtım şebekelerinde arıza; kaliteli ve sürekli enerji akışını engelleyici faktörler olarak tanımlanmaktadır. Arızanın meydana gelmesi sonrasında Elektrik Dağıtım Şirketleri, bakım-onarım ve yatırım çalışmaları ile düzeltici faaliyetler gerçekleştirmektedir. Meydana gelen arızalar ve sonrası düzeltici faaliyetler ile teknik kalite parametreleri sistemlerce oluşturulmaktadır. Ancak ortaya çıkan teknik veriler, herhangi bir tahminleme altyapısında kullanılmamakta, düzeltici faaliyetler genel olarak yorum ve taleplere istinaden gerçekleştirilmektedir. Bu çalışmada, sezgisel yaklaşımların önüne geçmek amacıyla, elektrik dağıtım şirketi operatörlerinin saha faaliyetleri sonrası sistemler tarafından örneklenerek kayıt altına alınan Aras EDAŞ’a ait Kesinti Süreleri ve Sıklığı verileri ile ilgili dönemlere ait Aras EDAŞ işletme sorumluluk sahasındaki 7 ile esas meteorolojik veriler kullanılmıştır. Veri seti içerisinde yer alan öznitelikler ve sınıflar üzerinde veri ön işleme, öznitelik seçimi, öznitelik çıkarımı gerçekleştirilmiştir. Regresyon işlemleri ile tahminleme gerçekleştirilecek hale gelen veri setleri %80’i eğitim ve %20’si test verisi olacak şekilde; Hafif Gradyan Artırma Makinesi (LGBM), Aşırı Gradyan Artırma (XGB), Destek Vektör, Rastgele Orman, Kategorik Artırma, k-En Yakın Komşu, Karar Ağacı, Lineer olmak üzere 8 farklı regresyon modeline tabi tutulmuştur. Veri seti üzerinde yer alan iki farklı bağımlı değişkene ait çok sınıflı değerler ayrı ayrı sınıf modeline dahil edilmiş olup toplamda 8 farklı model için 16 adet regresyon çalışması gerçekleştirilmiştir. En iyi model yapısına ulaşabilmek amacıyla hiperparametre optimizasyonu uygulanmıştır. Birincil çok sınıflı regresyon tahmini için en iyi model doğruluğu LGBM Regressor ile %93,305 olarak elde edilirken, ikincil çok sınıflı tahmin için en iyi model doğruluğu XGB Regressor ile %95,812 olarak elde edilmiştir.

APA, Harvard, Vancouver, ISO, and other styles

8

Mona, Mona, El-Sayed M. El El-kenawy, Mohamed Gamal Abdel Abdel-Fattah, Islam Ismael, and Hossam El Deen Salah Mostafa. "Machine Learning Models with Statistical Analysis Techniques for ForecastingWind Turbines Scada Systems Measurement." Fusion: Practice and Applications 19, no. 2 (2025): 64–81. https://doi.org/10.54216/fpa.190205.

Full text

Abstract:

Wind energy is one of the fastest-growing sustainable, clean, and renewable sources, attracting significant attention and investment from many countries. However, given the substantial capital investment required for wind power plants, understanding the proposed plants’ performance becomes critical before implementation. This assessment is most effectively conducted using refined wind power predictability models and precise wind velocity data. Accurate wind forecasts are essential for informed decision-making and effective wind energy utilization. In this study, three advanced Machine Learning (ML) regression methods were applied to the TNWind dataset to predict the power output of wind turbines. The dataset variables included date and time (measured at 10-minute intervals), low-voltage active power (in kW), wind speed (in ms), the theoretical wind power curve (in kWh), and wind direction. To predict wind power output, six supervised ML models were trained, including Random Forest Regressor (RF), Extreme Gradient Boosting Regressor (XGB), Gradient Boosting Regressor (GB), Support Vector Machine Regressor (SVR), K-Neighbors Regressor (KN), and Linear Regressor. The analysis revealed that the Random Forest model outperformed the others, achieving exceptional performance metrics: an R2 value of 0.97, an MAE of 0.17 and an MSE of 0.07. The analysis to identify the outcomes for wind power generation from machine learning proves that renewable energies are more capable and are a lucrative investment.

APA, Harvard, Vancouver, ISO, and other styles

9

Meng, Jing-Bi, Zai-Jian An, and Chun-Shan Jiang. "Machine learning-based prediction of LDL cholesterol: performance evaluation and validation." PeerJ 13 (April 9, 2025): e19248. https://doi.org/10.7717/peerj.19248.

Full text

Abstract:

Objective This study aimed to validate and optimize a machine learning algorithm for accurately predicting low-density lipoprotein cholesterol (LDL-C) levels, addressing limitations of traditional formulas, particularly in hypertriglyceridemia. Methods Various machine learning models—linear regression, K-nearest neighbors (KNN), decision tree, random forest, eXtreme Gradient Boosting (XGB), and multilayer perceptron (MLP) regressor—were compared to conventional formulas (Friedewald, Martin, and Sampson) using lipid profiles from 120,174 subjects (2020–2023). Predictive performance was evaluated using R-squared (R2), mean squared error (MSE), and Pearson correlation coefficient (PCC) against measured LDL-C values. Results Machine learning models outperformed traditional methods, with Random Forest and XGB achieving the highest accuracy (R2 = 0.94, MSE = 89.25) on the internal dataset. Among the traditional formulas, the Sampson method performed best but showed reduced accuracy in high triglyceride (TG) groups (TG > 300 mg/dL). Machine learning models maintained high predictive power across all TG levels. Conclusion Machine learning models offer more accurate LDL-C estimates, especially in high TG contexts where traditional formulas are less reliable. These models could enhance cardiovascular risk assessment by providing more precise LDL-C estimates, potentially leading to more informed treatment decisions and improved patient outcomes.

APA, Harvard, Vancouver, ISO, and other styles

10

Petridis, Christos, and Michael Vassilakopoulos. "Detecting Hull Fouling using Machine Learning Algorithms trained on Ship Propulsion Data to Improve Resource Management and Increase Environmental Benefits." ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences X-4/W4-2024 (May 31, 2024): 185–92. http://dx.doi.org/10.5194/isprs-annals-x-4-w4-2024-185-2024.

Full text

Abstract:

Abstract. This study aims to develop a methodology to assess hull fouling based on ship propulsion data such as speed, draft and weather related data. Hull fouling is an unavoidable phenomenon in ships and results in higher fuel consumption and the maintenance frequency has be the optimal one. Despite the fact that until now this task has primarily relied on empirical rules, it turns out that it can be improved by employing machine learning techniques. Using data from clean-hull ships, we aim to isolate and consider only the weather in this study. Our goal is to replace empirical rules with machine learning, as the vast amount of data we possess can significantly aid us in this endeavor. It ends up to be a regression problem, and therefore, we experiment with several supervised algorithms using k-fold cross validation to finally select models based on ensemble methods or artificial neural networks. We propose the potential use of MLP Regressor, Random Forest Regressor and XGB Regressor since all of them yielded very good results in terms of some performance metrics. The timely detection of hull fouling can provide substantial benefits in terms of resource management and environmental sustainability.

APA, Harvard, Vancouver, ISO, and other styles

11

Nguyen, Huu Nam, Quoc Thanh Tran, Canh Tung Ngo, Duc Dam Nguyen, and Van Quan Tran. "Solar energy prediction through machine learning models: A comparative analysis of regressor algorithms." PLOS ONE 20, no. 1 (2025): e0315955. https://doi.org/10.1371/journal.pone.0315955.

Full text

Abstract:

Solar energy generated from photovoltaic panel is an important energy source that brings many benefits to people and the environment. This is a growing trend globally and plays an increasingly important role in the future of the energy industry. However, it intermittent nature and potential for distributed system use require accurate forecasting to balance supply and demand, optimize energy storage, and manage grid stability. In this study, 5 machine learning models were used including: Gradient Boosting Regressor (GB), XGB Regressor (XGBoost), K-neighbors Regressor (KNN), LGBM Regressor (LightGBM), and CatBoost Regressor (CatBoost). Leveraging a dataset of 21045 samples, factors like Humidity, Ambient temperature, Wind speed, Visibility, Cloud ceiling and Pressure serve as inputs for constructing these machine learning models in forecasting solar energy. Model accuracy is meticulously assessed and juxtaposed using metrics such as coefficient of determination (R2), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). The results show that the CatBoost model emerges as the frontrunner in predicting solar energy, with training values of R2 value of 0.608, RMSE of 4.478 W and MAE of 3.367 W and the testing value is R2 of 0.46, RMSE of 4.748 W and MAE of 3.583 W. SHAP analysis reveal that ambient temperature and humidity have the greatest influences on the value solar energy generated from photovoltaic panel.

APA, Harvard, Vancouver, ISO, and other styles

12

N., Baggyalakshmi, Anugrahaa V., and Revathi R. "Analyzing Restaurant Reviews to Predict Customer Satisfaction Trends." International Academic Journal of Science and Engineering 10, no. 2 (2023): 106–13. http://dx.doi.org/10.9756/iajse/v10i2/iajse1014.

Full text

Abstract:

Numerous studies comparing the quality of various restaurants' cuisines have been conducted. Area, average cost for two people, votes, cuisines, mostly flavour, and restaurant type are some of the criteria used to evaluate restaurants. Finding out which restaurants people like to eat at and reading reviews of such places is the primary objective here. The goal of this study is to develop a model that can foretell if a review of the eatery will be favourable or negative. A number of prediction algorithms, including Multinomial Naive Bayes, SVC, XGB Regressor, Pipeline, and Logistic Regression, will be utilised to achieve this goal. Finally, we'd like to identify the "best" model that can forecast the reviewer's emotional tone.

APA, Harvard, Vancouver, ISO, and other styles

13

Bhuyar, Vrushali, and Sachin N. Deshmukh. "Enhancing plagiarism detection using data pre-processing and machine learning approach." IAES International Journal of Artificial Intelligence (IJ-AI) 14, no. 3 (2025): 1940. https://doi.org/10.11591/ijai.v14.i3.pp1940-1950.

Full text

Abstract:

Modern technology and the internet have enhanced academic information accessibility, but this has led to a rising global concern about plagiarism. Researchers are actively exploring machine learning as a promising solution for detection. This study underscores the importance of robust data preprocessing for optimal machine learning algorithm performance. Using a dataset of 67 research papers, big five factors (OCEAN), and plagiarism rates, the study employed machine learning to detect plagiarism. The training process involved exposing algorithms to an 80% training subset, followed by evaluating their performance on the remaining 20% in the testing phase, assessing generalization capabilities. For the random forest regressor, bagging regressor, gradient boosting regressor, XGB regressor, and AdaBoost regressor, corresponding root mean squared error (RMSE) are 9.48, 10.66, 11.79, 12.53, and 12.79, respectively. This research contributes novel insights to existing literature by introducing a plagiarism detection model that innovatively integrates outlier detection, normalization, missing value imputation, and feature selection. The unique aspect lies in the effective combination of feature selection and missing value imputation, surpassing previous benchmarks and optimizing precision and efficiency. The approach is metaphorically likened to assembling puzzle pieces, highlighting the distinctive methodology employed in enhancing the performance of the plagiarism detection model using data preprocessing.

APA, Harvard, Vancouver, ISO, and other styles

14

Khandelwal, Kapil, and Ajay K. Dalai. "Prediction of Individual Gas Yields of Supercritical Water Gasification of Lignocellulosic Biomass by Machine Learning Models." Molecules 29, no. 10 (2024): 2337. http://dx.doi.org/10.3390/molecules29102337.

Full text

Abstract:

Supercritical water gasification (SCWG) of lignocellulosic biomass is a promising pathway for the production of hydrogen. However, SCWG is a complex thermochemical process, the modeling of which is challenging via conventional methodologies. Therefore, eight machine learning models (linear regression (LR), Gaussian process regression (GPR), artificial neural network (ANN), support vector machine (SVM), decision tree (DT), random forest (RF), extreme gradient boosting (XGB), and categorical boosting regressor (CatBoost)) with particle swarm optimization (PSO) and a genetic algorithm (GA) optimizer were developed and evaluated for prediction of H2, CO, CO2, and CH4 gas yields from SCWG of lignocellulosic biomass. A total of 12 input features of SCWG process conditions (temperature, time, concentration, pressure) and biomass properties (C, H, N, S, VM, moisture, ash, real feed) were utilized for the prediction of gas yields using 166 data points. Among machine learning models, boosting ensemble tree models such as XGB and CatBoost demonstrated the highest power for the prediction of gas yields. PSO-optimized XGB was the best performing model for H2 yield with a test R2 of 0.84 and PSO-optimized CatBoost was best for prediction of yields of CH4, CO, and CO2, with test R2 values of 0.83, 0.94, and 0.92, respectively. The effectiveness of the PSO optimizer in improving the prediction ability of the unoptimized machine learning model was higher compared to the GA optimizer for all gas yields. Feature analysis using Shapley additive explanation (SHAP) based on best performing models showed that (21.93%) temperature, (24.85%) C, (16.93%) ash, and (29.73%) C were the most dominant features for the prediction of H2, CH4, CO, and CO2 gas yields, respectively. Even though temperature was the most dominant feature, the cumulative feature importance of biomass characteristics variables (C, H, N, S, VM, moisture, ash, real feed) as a group was higher than that of the SCWG process condition variables (temperature, time, concentration, pressure) for the prediction of all gas yields. SHAP two-way analysis confirmed the strong interactive behavior of input features on the prediction of gas yields.

APA, Harvard, Vancouver, ISO, and other styles

15

Al-Taher, Rogaia H., Mohamed E. Abuarab, Abd Al-Rahman S. Ahmed, et al. "Predicting Green Water Footprint of Sugarcane Crop Using Multi-Source Data-Based and Hybrid Machine Learning Algorithms in White Nile State, Sudan." Water 16, no. 22 (2024): 3241. http://dx.doi.org/10.3390/w16223241.

Full text

Abstract:

Water scarcity and climate change present substantial obstacles for Sudan, resulting in extensive migration. This study seeks to evaluate the effectiveness of machine learning models in forecasting the green water footprint (GWFP) of sugarcane in the context of climate change. By analyzing various input variables such as climatic conditions, agricultural data, and remote sensing metrics, the research investigates their effects on the sugarcane cultivation period from 2001 to 2020. A total of seven models, including random forest (RF), extreme gradient boosting (XGBoost), and support vector regressor (SVR), in addition to hybrid combinations like RF-XGB, RF-SVR, XGB-SVR, and RF-XGB-SVR, were applied across five scenarios (Sc) which includes different combinations of variables used in the study. The most significant mean bias error (MBE) was recorded in RF with Sc3 (remote sensing parameters), at 5.14 m3 ton−1, followed closely by RF-SVR at 5.05 m3 ton−1, while the minimum MBE was 0.03 m3 ton−1 in RF-SVR with Sc1 (all parameters). SVR exhibited the highest R2 values throughout all scenarios. Notably, the R2 values for dual hybrid models surpassed those of triple hybrid models. The highest Nash–Sutcliffe efficiency (NSE) value of 0.98 was noted in Sc2 (climatic parameters) and XGB-SVR, whereas the lowest NSE of 0.09 was linked to SVR in Sc3. The root mean square error (RMSE) varied across different ML models and scenarios, with Sc3 displaying the weakest performance regarding remote sensing parameters (EVI, NDVI, SAVI, and NDWI). Effective precipitation exerted the most considerable influence on GWFP, contributing 81.67%, followed by relative humidity (RH) at 7.5% and Tmax at 5.24%. The study concludes that individual models were as proficient as, or occasionally surpassed, double and triple hybrid models in predicting GWFP for sugarcane. Moreover, remote sensing indices demonstrated minimal positive influence on GWFP prediction, with Sc3 producing the lowest statistical outcomes across all models. Consequently, the study advocates for the use of hybrid models to mitigate the error term in the prediction of sugarcane GWFP.

APA, Harvard, Vancouver, ISO, and other styles

16

Habeeb, Fadya A., Mustafa Abdulfattah Habeeb, Yahya Layth Khaleel, and Fatimah N. Ameen. "Global Analysis and Prediction of CO2 and Greenhouse Gas Emissions across Continents." Applied Data Science and Analysis 2024 (November 25, 2024): 173–88. https://doi.org/10.58496/adsa/2024/014.

Full text

Abstract:

Understanding the concentrations of Carbon Dioxide (CO2) and greenhouse gases is very important in solving the problem of climate change. These emissions are the major cause of global warming, which, in turn, has many effects on the environment, economy and society. For this reason, the prediction models for these emissions must be precise to aid policy makers in planning for the effects of the climate in the future. To evaluate the emission data of different continents, this paper seeks to identify related patterns and findings that can help reduce emissions worldwide. The dataset used contains emission data and geographic information from several countries and allows the comparison of several ML models. The models that have been reviewed in this study are linear regression (LR), decision tree regression (DT), random forest regression (RF), support vector regression (SVR), k-nearest neighbor regression (KNN), the XGB regressor, the gradient boosting regressor, Ridge and Lasso. Among the models, the gradient boosting regressor was found to have the best prediction capability, with an R-squared value of 0. The highest value of the mean absolute error (MAE) was 929, and the lowest mean squared error (MSE) was 2535.30. This model outperforms the other models because of its excellent ability to identify the complex interactions between the input variables and emissions. The conclusions stress the possibility of using ensembles, such as gradient boosting, for emission forecasting and present a contribution to studies of this issue for researchers and policymakers. This is a nominal attempt in the ongoing global endeavour to gain insight and curb the determinable levels of CO2 and greenhouse gas emissions for effective decision-maki

APA, Harvard, Vancouver, ISO, and other styles

17

Anker, Marvin, Christine Borsum, Youfeng Zhang, Yanyan Zhang, and Christian Krupitzer. "Using a Machine Learning Regression Approach to Predict the Aroma Partitioning in Dairy Matrices." Processes 12, no. 2 (2024): 266. http://dx.doi.org/10.3390/pr12020266.

Full text

Abstract:

Aroma partitioning in food is a challenging area of research due to the contribution of several physical and chemical factors that affect the binding and release of aroma in food matrices. The partition coefficient measured by the Kmg value refers to the partition coefficient that describes how aroma compounds distribute themselves between matrices and a gas phase, such as between different components of a food matrix and air. This study introduces a regression approach to predict the Kmg value of aroma compounds of a wide range of physicochemical properties in dairy matrices representing products of different compositions and/or processing. The approach consists of data cleaning, grouping based on the temperature of Kmg analysis, pre-processing (log transformation and normalization), and, finally, the development and evaluation of prediction models with regression methods. We compared regression analysis with linear regression (LR) to five machine-learning-based regression algorithms: Random Forest Regressor (RFR), Gradient Boosting Regression (GBR), Extreme Gradient Boosting (XGBoost, XGB), Support Vector Regression (SVR), and Artificial Neural Network Regression (NNR). Explainable AI (XAI) was used to calculate feature importance and therefore identify the features that mainly contribute to the prediction. The top three features that were identified are log P, specific gravity, and molecular weight. For the prediction of the Kmg in dairy matrices, R2 scores of up to 0.99 were reached. For 37.0 °C, which resembles the temperature of the mouth, RFR delivered the best results, and, at lower temperatures of 7.0 °C, typical for a household fridge, XGB performed best. The results from the models work as a proof of concept and show the applicability of a data-driven approach with machine learning to predict the Kmg value of aroma compounds in different dairy matrices.

APA, Harvard, Vancouver, ISO, and other styles

18

Baressi Šegota, Sandi, Mario Ključević, Dario Ogrizović, and Zlatan Car. "Modeling of Actuation Force, Pressure and Contraction of Fluidic Muscles Based on Machine Learning." Technologies 12, no. 9 (2024): 161. http://dx.doi.org/10.3390/technologies12090161.

Full text

Abstract:

In this paper, the dataset is collected from the fluidic muscle datasheet. This dataset is then used to train models predicting the pressure, force, and contraction length of the fluidic muscle, as three separate outputs. This modeling is performed with four algorithms—extreme gradient boosted trees (XGB), ElasticNet (ENet), support vector regressor (SVR), and multilayer perceptron (MLP) artificial neural network. Each of the four models of fluidic muscles (5-100N, 10-100N, 20-200N, 40-400N) is modeled separately: First, for a later comparison. Then, the combined dataset consisting of data from all the listed datasets is used for training. The results show that it is possible to achieve quality regression performance with the listed algorithms, especially with the general model, which performs better than individual models. Still, room for improvement exists, due to the high variance of the results across validation sets, possibly caused by non-normal data distributions.

APA, Harvard, Vancouver, ISO, and other styles

19

Zhang, Tianjie, Alex Smith, Huachun Zhai, and Yang Lu. "LSTM+MA: A Time-Series Model for Predicting Pavement IRI." Infrastructures 10, no. 1 (2025): 10. https://doi.org/10.3390/infrastructures10010010.

Full text

Abstract:

The accurate prediction of pavement performance is essential for transportation administration or management to appropriately allocate resources road maintenance and upkeep. The international roughness index (IRI) is one of the most commonly used pavement performance indicators to reflect the surface roughness. However, the existing research on IRI prediction mainly focuses on using linear regression or traditional machine learning, which cannot take into account the historical effects of IRI caused by climate, traffic, pavement construction and intermittent maintenance. In this work, a long short-term memory (LSTM)-based model, LSTM+MA, is proposed to predict the IRI of pavements using the time-series data extracted from the long-term pavement performance (LTPP) dataset. Effective preprocessing methods and hyperparameter fine-tuning are selected to improve the accuracy of the model. The performance of the LSTM+MA is compared with other state-of-the-art models, including logistic regressor (LR), support vector regressor (SVR), random forest (RF), K-nearest-neighbor regressor (KNR), fully connected neural network (FNN), XGBoost (XGB), recurrent neural network (RNN) and LSTM. The results show that selected preprocessing methods can help the model learn quickly from the data and reach high accuracy in small epochs. Also, it shows that the proposed LSTM+MA model significantly outperforms other models, with an R2 of 0.965 and a mean square error (MSE) of 0.030 in the test datasets. Moreover, an overfitting score is proposed in this work to represent the severity degree of the overfitting problem, and it shows that the proposed model does not suffer severely from overfitting.

APA, Harvard, Vancouver, ISO, and other styles

20

Saha, Sajal, Anwar Haque, and Greg Sidebottom. "Multi-Step Internet Traffic Forecasting Models with Variable Forecast Horizons for Proactive Network Management." Sensors 24, no. 6 (2024): 1871. http://dx.doi.org/10.3390/s24061871.

Full text

Abstract:

The ISP (Internet Service Provider) industry relies heavily on internet traffic forecasting (ITF) for long-term business strategy planning and proactive network management. Effective ITF frameworks are necessary to manage these networks and prevent network congestion and over-provisioning. This study introduces an ITF model designed for proactive network management. It innovatively combines outlier detection and mitigation techniques with advanced gradient descent and boosting algorithms, including Gradient Boosting Regressor (GBR), Extreme Gradient Boosting (XGB), Light Gradient Boosting Machine (LGB), CatBoost Regressor (CBR), and Stochastic Gradient Descent (SGD). In contrast to traditional methods that rely on synthetic datasets, our model addresses the problems caused by real aberrant ISP traffic data. We evaluated our model across varying forecast horizons—six, nine, and twelve steps—demonstrating its adaptability and superior predictive accuracy compared to traditional forecasting models. The integration of the outlier detection and mitigation module significantly enhances the model’s performance, ensuring robust and accurate predictions even in the presence of data volatility and anomalies. To guarantee that our suggested model works in real-world situations, our research is based on an extensive experimental setup that uses real internet traffic monitoring from high-speed ISP networks.

APA, Harvard, Vancouver, ISO, and other styles

21

Prashant Bhuva, Ankur Bhogayata. "Predicting Compressive Strength of Self-Compacting Concrete Using Machine and Deep Learning Models." Journal of Information Systems Engineering and Management 10, no. 28s (2025): 334–47. https://doi.org/10.52783/jisem.v10i28s.4334.

Full text

Abstract:

This paper discusses the compressive strength prediction for self-compacting concrete (SCC) by a host of machine learning (ML) and deep learning (DL) models is discussed in this research work. Random Forest (RF), Keras Regressor (KR), Extremely Randomized Trees (ERT), Extreme Gradient Boosting (XGB), Gradient Boosting (GB), Light Gradient Boosting Machine (LGBM), and Category Boosting (CB) are some of the many ensemble methods until now. In addition, the ability of several models to predict the compressive strength of SCC was examined with generalized additive models like Gradient Boosting Regressor and Neural Networks based on Keras. Twenty papers constituted the dataset, which was divided into three subsets for validation, testing, and training. The principal input parameters utilized in model building are superplasticizers, cement, water, fine aggregates, coarse aggregates, and mineral admixtures. To check the accuracy of each model developed, some performance indicators were chosen, like R², RMSE, MAE, and MAPE, which measure how accurately a model predicts compressive strength. The best predictive accuracy was found for the models under test in GB with R² = 5.12, MSE = 26.23, and MAE = 4.13, whereas Keras Regressor also performed very well with R² = 0.6948, RMSE = 0.0832, and MAE = 0.0569. These results thus establish that the GB and KR models can prove to be good resources for predictive efficiency in determining the compressive strength of SCC, exhibiting great potential for machine learning and deep learning methodologies applied to concrete materials.

APA, Harvard, Vancouver, ISO, and other styles

22

Zhao, Zixuan. "Research on the Distributions of Products for Big Mart." Advances in Economics, Management and Political Sciences 43, no. 1 (2023): 32–39. http://dx.doi.org/10.54254/2754-1169/43/20232121.

Full text

Abstract:

In order to have a brief insight into the process of business data analysis for the big marts product and through which to find out the inner logic about data analysis. This research did a brief research based on the big mart sales dataset from Kaggle. The data are collected in 2013 for 1559 products across 10 stores in different cities. This research aims to build a predictive model and forecast the sales of each product at the specific stores and then try to understand the properties of products and outlets which play a key role in increasing sales. After using some basic analysis methods based on python, the author gets the distribution outcome of a big marts product and creates five simple models to predict the final outlet-sales and find out the most performed model using MAE criteria. The outcome shows that finally the XGB Regressor model performed best and for the real business, it is the most suitable selection.

APA, Harvard, Vancouver, ISO, and other styles

23

Бєліков, М., Т. Ліхоузова та Ю. Олійник. "Моделі для аналізу складності англійських слів у тексті за шкалою від A1 до С2". Адаптивні системи автоматичного управління 2, № 45 (2024): 84–99. http://dx.doi.org/10.20535/1560-8956.45.2024.313091.

Full text

Abstract:

На сучасному етапі глобалізації англійська мова відіграє ключову роль як мова міжнародного спілкування. Це веде до того, що все більше людей стають її носіями на різних рівнях. Робота присвячена аналізу англійських слів за шкалою від A1 до C2, що відповідає найнижчому та найвищому рівням володіння за стандартами CEFR. Модель, яка прогнозує складність слів у тексті, може бути використана для покращення освітнього процесу. Наприклад, можна знаходити список ймовірно невідомих та складних слів для кінцевого користувача в будь-якому тексті залежно від його рівня володіння англійською мовою. Такий підхід полегшить процес вивчення мови, надавши персоналізований список слів, на якому варто зосередитися. Також модель може бути корисною для аналізу складності текстів залежно від кількості слів кожного рівня складності в них. Це може допомогти вчителям підготувати матеріали, які відповідають рівню знань своїх учнів, а також ідентифікувати слова, які можуть бути складними для їх розуміння. Розроблено SQLite сховище даних англійських слів та їх частоти в англійських книжках з 1900 по 2019 роки. Функціонал реалізовано за допомогою SQL скриптів. Для написання ETL процесів, аналізу даних, створення, тренування та порівняння моделей прогнозування рівня складності слів використана мова програмування Python, використовувались бібліотеки Sqlite3, Lemminflect, NumPy, Seaborn, Matplotlib, SciPy, Sklearn, SpaCy та XGBoost. Запропоновано застосунок мовою програмування Python, що отримує вибірку даних зі створеного сховища, графічно їх відображає, проводить інтелектуальний аналіз, тренує та порівнює моделі за метриками accuracy, precision, recall та f1-score. Для аналізу даних та прогнозування рівня складності англійських слів за шкалою CEFR від А1 до С2 на основі їх частоти в англійській мові використані моделі: PchipInterpolator, логарифмічна модель, Gradient Boosting Regressor, Random Forest Regressor та XGB regressor. Результати кожної моделі оцінювались на тестовій вибірці, обрана найкраща модель для подальшого прогнозування рівня складності всіх інших слів англійської мови. Бібл. 13, іл. 14

APA, Harvard, Vancouver, ISO, and other styles

24

Ennaji, Oumnia, Sfia Baha, Leonardus Vergutz, and Achraf El Allali. "Gradient boosting for yield prediction of elite maize hybrid ZhengDan 958." PLOS ONE 19, no. 12 (2024): e0315493. https://doi.org/10.1371/journal.pone.0315493.

Full text

Abstract:

Understanding accurate methods for predicting yields in complex agricultural systems is critical for effective nutrient management and crop growth. Machine learning has proven to be an important tool in this context. Numerous studies have investigated its potential for predicting yields under different conditions. Among these algorithms, Random Forest (RF) has gained prominence due to its ability to manage large data sets with high dimensions, as well as its ability to uncover complicated non-linear relationships and interactions between variables. RF is particularly suitable for scenarios with categorical variables and missing data. Given the complex web of management practices and their nonlinear effects on yield prediction, it is important to investigate new machine learning algorithms. In this context, our study focused on the evaluation of gradient boosting methods, particularly Extreme Gradient Boosting (XGB) and Gradient Boosting Regressor (GBR), as potential candidates for yield estimation of the maize hybrid Zhengdan 958. Our aim was not only to evaluate and compare these algorithms with existing approaches, but also to comprehensively analyze the resulting model uncertainties. Our approach includes comparing multiple machine learning algorithms, developing and selecting suitable features, fine-tuning the models by training and adjusting the hyperparameters, and visualizing the results. Using a recent dataset of over 1700 maize yield data pairs, our evaluation included a spectrum of algorithms. Our results show robust prediction accuracy for all algorithms. In particular, the predictions of XGB (RMSE = 0.37, R2 = 0.87 and MAE = 0.26) and GBR(RMSE = 0.39, R2 = 0.86 and MAE = 0.27), emphasized the central role of weather characteristics and confirmed the high dependence of crop yield prediction on environmental attributes. Utilizing the capabilities of gradient boosting for yield prediction holds immense potential and is consistent with the promise of this method to serve as a catalyst for further investigation in this evolving field

APA, Harvard, Vancouver, ISO, and other styles

25

Sopasakis, Alexandros, Maria Nilsson, Mattias Askenmo, Fredrik Nyholm, Lillemor Mattsson Hultén, and Victoria Rotter Sopasakis. "Machine learning evaluation for identification of M-proteins in human serum." PLOS ONE 19, no. 4 (2024): e0299600. http://dx.doi.org/10.1371/journal.pone.0299600.

Full text

Abstract:

Serum electrophoresis (SPEP) is a method used to analyze the distribution of the most important proteins in the blood. The major clinical question is the presence of monoclonal fraction(s) of antibodies (M-protein/paraprotein), which is essential for the diagnosis and follow-up of hematological diseases, such as multiple myeloma. Recent studies have shown that machine learning can be used to assess protein electrophoresis by, for example, examining protein glycan patterns to follow up tumor surgery. In this study we compared 26 different decision tree algorithms to identify the presence of M-proteins in human serum by using numerical data from serum protein capillary electrophoresis. For the automated detection and clustering of data, we used an anonymized data set consisting of 67,073 samples. We found five methods with superior ability to detect M-proteins: Extra Trees (ET), Random Forest (RF), Histogram Grading Boosting Regressor (HGBR), Light Gradient Boosting Method (LGBM), and Extreme Gradient Boosting (XGB). Additionally, we implemented a game theoretic approach to disclose which features in the data set that were indicative of the resulting M-protein diagnosis. The results verified the gamma globulin fraction and part of the beta globulin fraction as the most important features of the electrophoresis analysis, thereby further strengthening the reliability of our approach. Finally, we tested the algorithms for classifying the M-protein isotypes, where ET and XGB showed the best performance out of the five algorithms tested. Our results show that serum capillary electrophoresis combined with decision tree algorithms have great potential in the application of rapid and accurate identification of M-proteins. Moreover, these methods would be applicable for a variety of blood analyses, such as hemoglobinopathies, indicating a wide-range diagnostic use. However, for M-protein isotype classification, combining machine learning solutions for numerical data from capillary electrophoresis with gel electrophoresis image data would be most advantageous.

APA, Harvard, Vancouver, ISO, and other styles

26

Арађанин, Стефан, та Јелена Сливка. "ПРИМЕНА МОДЕЛА ВРЕМЕНСКИХ СЕРИЈА ЗА ПРЕДИКЦИЈУ СРЕДЊЕГ ГЛОБАЛНОГ НИВОА МОРА". Zbornik radova Fakulteta tehničkih nauka u Novom Sadu 38, № 12 (2023): 1770–73. http://dx.doi.org/10.24867/25be38aradjanin.

Full text

Abstract:

Глобално загревање се односи на стално и дугорочно повећање просечне површинске температуре Земље. То је првенствено узроковано људским активностима као што су сагоревање фосилних горива, крчење шума и испуштање штетних гасова у атмосферу, попут угљен-диоксида и метана. Глобално загревање има дубоке ефекте на океане и мора широм света. Пораст нивоа мора резултира разним последицама на Земљи и захтева хитне мере за ублажавање. Овај рад истражује како различити фактори утичу на месечни пораст нивоа мора и његове временске флуктуације. Разматрани су фактори попут температуре, стопе топљења глечера, густине мора, салинитета и нивоа угљен-диоксида. Први део рада фокусира се на прикупљању, data wrangling-у и истраживачкој анализи података (енг. Explorative Data Analysis, EDA) којима је заједнички атрибут временска одредница. Други део рада фокусира се на примену модела временских серија (енг. Time Series) коришћењем XGB Regressor-a, у циљу предвиђања тачне промене средњег глобалног нивоа мора. Користећи сређене и анализиране податке, модел може да укључи низ различитих фактора који утичу на ниво мора и да предвиди њихову промену на основу историјских образаца, са веома ниском стопом грешке.

APA, Harvard, Vancouver, ISO, and other styles

27

Tan, Ms Lily, Dr James Ruﬄe, Ms Samia Mohinta, et al. "MATHEMATICAL MODELLING OF SURVIVAL IN LOW GRADE GLIOMAS AT MALIGNANT TRANSFORMATION WITH XGBOOST." Neuro-Oncology 26, Supplement_7 (2024): vii12—vii13. http://dx.doi.org/10.1093/neuonc/noae158.048.

Full text

Abstract:

Abstract AIMS To develop non-linear machine learning models using the XGBoost algorithm to predict a continuous (overall survival (OS) and a binary survival outcome (OS &gt; 5 years) using clinical, molecular and genetic, and radiomic data. METHOD Patients with LGGs treated at a single institution (2005-2020) with histology and MRIs at the time of malignant transformation (MT) were retrospectively included in this study. MRIs underwent in-house tumour segmentation pipeline with radiomic feature extraction of whole-tumour, enhancing, non-enhancing and oedema components, and masked disconnectome map components. Patients were split into training and testing sets for the development of the survival models, which were assessed with mean absolute error (MAE) and root mean square error (RMSE) for the prediction of OS; and receiver operating characteristics analysis for the prediction of OS &gt; 5 years. RESULTS Of 553 patients, 415 patients were included in the training set and 138 patients in the testing set. The XGB Regressor model was able to predict overall survival (OS) from the time of malignant transformation (tMRI) with an MAE of 953 days (RMSE: 1163 days). The XGB Classiﬁer model was able to predict the probability of OS &gt; 5 years from tMRI with an accuracy of 64% (sensitivity: 58%, speciﬁcity: 70%). Age, IDH1 mutation, 1p/19q co- deletion, regularity of tumour shape, and disconnectome-related perilesional components were most predictive of survival outcome. CONCLUSION This study has investigated the predictive capabilities of clinical, molecular and genetic, and radiomic data to develop survival analysis models, using XGBoost, to predict OS and OS &gt; 5 years in patients with LGG at tMRI. We corroborate previous ﬁndings that age, 1p/19q co-deletion and IDH1 mutation are positive prognosticators for survival. However, further investigation into the radiomics of the disconnectome, especially of the perilesional oedema compartment, presents an intriguing and novel avenue for survival analysis of patients with LGG.

APA, Harvard, Vancouver, ISO, and other styles

28

Mukhamediev, Ravil, Yedilkhan Amirgaliyev, Yan Kuchin, et al. "Operational Mapping of Salinization Areas in Agricultural Fields Using Machine Learning Models Based on Low-Altitude Multispectral Images." Drones 7, no. 6 (2023): 357. http://dx.doi.org/10.3390/drones7060357.

Full text

Abstract:

Salinization of cultivated soil is an important negative factor that reduces crop yields. Obtaining accurate and timely data on the salinity of soil horizons allows for planning the agrotechnical measures to reduce this negative impact. The method of soil salinity mapping of the 0–30 cm layer on irrigated arable land with the help of multispectral data received from the UAV is described in this article. The research was carried out in the south of the Almaty region of Kazakhstan. In May 2022, 80 soil samples were taken from the ground survey, and overflight of two adjacent fields was performed. The flight was carried out using a UAV equipped with a multispectral camera. The data preprocessing method is proposed herein, and several machine learning algorithms are compared (XGBoost, LightGBM, random forest, support vector machines, ridge regression, elastic net, etc.). Machine learning methods provided regression reconstruction to predict the electrical conductivity of the 0–30 cm soil layer based on an optimized list of spectral indices. The XGB regressor model showed the best quality results: the coefficient of determination was 0.701, the mean-squared error was 0.508, and the mean absolute error was 0.514. A comparison with the results obtained based on Landsat 8 data using a similar model was performed. Soil salinity mapping using UAVs provides much better spatial detailing than satellite data and has the possibility of an arbitrary selection of the survey time, less dependence on the conditions of cloud cover, and a comparable degree of accuracy of estimates.

APA, Harvard, Vancouver, ISO, and other styles

29

Chin, Elizabeth L., Gabriel Simmons, Yasmine Y. Bouzid, et al. "Nutrient Estimation from 24-Hour Food Recalls Using Machine Learning and Database Mapping: A Case Study with Lactose." Nutrients 11, no. 12 (2019): 3045. http://dx.doi.org/10.3390/nu11123045.

Full text

Abstract:

The Automated Self-Administered 24-Hour Dietary Assessment Tool (ASA24) is a free dietary recall system that outputs fewer nutrients than the Nutrition Data System for Research (NDSR). NDSR uses the Nutrition Coordinating Center (NCC) Food and Nutrient Database, both of which require a license. Manual lookup of ASA24 foods into NDSR is time-consuming but currently the only way to acquire NCC-exclusive nutrients. Using lactose as an example, we evaluated machine learning and database matching methods to estimate this NCC-exclusive nutrient from ASA24 reports. ASA24-reported foods were manually looked up into NDSR to obtain lactose estimates and split into training (n = 378) and test (n = 189) datasets. Nine machine learning models were developed to predict lactose from the nutrients common between ASA24 and the NCC database. Database matching algorithms were developed to match NCC foods to an ASA24 food using only nutrients (“Nutrient-Only”) or the nutrient and food descriptions (“Nutrient + Text”). For both methods, the lactose values were compared to the manual curation. Among machine learning models, the XGB-Regressor model performed best on held-out test data (R2 = 0.33). For the database matching method, Nutrient + Text matching yielded the best lactose estimates (R2 = 0.76), a vast improvement over the status quo of no estimate. These results suggest that computational methods can successfully estimate an NCC-exclusive nutrient for foods reported in ASA24.

APA, Harvard, Vancouver, ISO, and other styles

30

De Carolis, Gabriele, Vincenzo Giannico, Leonardo Costanza, et al. "Prediction of Winter Wheat Parameters with Planet SuperDove Imagery and Explainable Artificial Intelligence." Agronomy 15, no. 1 (2025): 241. https://doi.org/10.3390/agronomy15010241.

Full text

Abstract:

This study investigated the application of high-resolution satellite imagery from SuperDove satellites combined with machine learning algorithms to estimate the spatiotemporal variability of some winter wheat parameters, including the relative leaf chlorophyll content (RCC), relative water content (RWC), and aboveground dry matter (DM). The research was carried out within an experimental field in Southern Italy during the 2024 growing season. Different machine learning (ML) algorithms were trained and compared using spectral band data and calculated vegetation indices (VIs) as predictors. Model performance was assessed using R2 and RMSE. The ML models tested were random forest (RF), support vector regressor (SVR), and extreme gradient boosting (XGB). RF outperformed the other ML algorithms in the prediction of RCC when using VIs as predictors (R2 = 0.81) and in the prediction of the RWC and DM when using spectral bands data as predictors (R2 = 0.71 and 0.87, respectively). Model explainability was assessed with the SHAP method. A SHAP analysis highlighted that GNDVI, Cl1, and NDRE were the most important VIs for predicting RCC, while yellow and red bands were the most important for DM prediction, and yellow and nir bands for RWC prediction. The best model found for each target was used to model its seasonal trend and produce a variability map. This approach highlights the potential of integrating ML and high-resolution satellite imagery for the remote monitoring of wheat, which can support sustainable farming practices.

APA, Harvard, Vancouver, ISO, and other styles

31

Azarifar, Mohammad, Kerem Ocaksonmez, Ceren Cengiz, Reyhan Aydoğan, and Mehmet Arik. "Machine Learning to Predict Junction Temperature Based on Optical Characteristics in Solid-State Lighting Devices: A Test on WLEDs." Micromachines 13, no. 8 (2022): 1245. http://dx.doi.org/10.3390/mi13081245.

Full text

Abstract:

While junction temperature control is an indispensable part of having reliable solid-state lighting, there is no direct method to measure its quantity. Among various methods, temperature-sensitive optical parameter-based junction temperature measurement techniques have been used in practice. Researchers calibrate different spectral power distribution behaviors to a specific temperature and then use that to predict the junction temperature. White light in white LEDs is composed of blue chip emission and down-converted emission from photoluminescent particles, each with its own behavior at different temperatures. These two emissions can be combined in an unlimited number of ways to produce diverse white colors at different brightness levels. The shape of the spectral power distribution can, in essence, be compressed into a correlated color temperature (CCT). The intensity level of the spectral power distribution can be inferred from the luminous flux as it is the special weighted integration of the spectral power distribution. This paper demonstrates that knowing the color characteristics and power level provide enough information for possible regressor trainings to predict any white LED junction temperature. A database from manufacturer datasheets is utilized to develop four machine learning-based models, viz., k-Nearest Neighbor (KNN), Radius Near Neighbors (RNN), Random Forest (RF), and Extreme Gradient Booster (XGB). The models were used to predict the junction temperatures from a set of dynamic opto-thermal measurements. This study shows that machine learning algorithms can be employed as reliable novel prediction tools for junction temperature estimation, particularly where measuring equipment limitations exist, as in wafer-level probing or phosphor-coated chips.

APA, Harvard, Vancouver, ISO, and other styles

32

Ding, Luyu, Yang Lv, Ruixiang Jiang, et al. "Predicting the Feed Intake of Cattle Based on Jaw Movement Using a Triaxial Accelerometer." Agriculture 12, no. 7 (2022): 899. http://dx.doi.org/10.3390/agriculture12070899.

Full text

Abstract:

The use of an accelerometer is considered as a promising method for the automatic measurement of the feeding behavior or feed intake of cattle, with great significance in facilitating daily management. To address further need for commercial use, an efficient classification algorithm at a low sample frequency is needed to reduce the amount of recorded data to increase the battery life of the monitoring device, and a high-precision model needs to be developed to predict feed intake on the basis of feeding behavior. Accelerograms for the jaw movement and feed intake of 13 mid-lactating cows were collected during feeding with a sampling frequency of 1 Hz at three different positions: the nasolabial levator muscle (P1), the right masseter muscle (P2), and the left lower lip muscle (P3). A behavior identification framework was developed to recognize jaw movements including ingesting, chewing and ingesting–chewing through extreme gradient boosting (XGB) integrated with the hidden Markov model solved by the Viterbi algorithm (HMM–Viterbi). Fourteen machine learning models were established and compared in order to predict feed intake rate through the accelerometer signals of recognized jaw movement activities. The developed behavior identification framework could effectively recognize different jaw movement activities with a precision of 99% at a window size of 10 s. The measured feed intake rate was 190 ± 89 g/min and could be predicted efficiently using the extra trees regressor (ETR), whose R2, RMSE, and NME were 0.97, 0.36 and 0.05, respectively. The three investigated monitoring sites may have affected the accuracy of feed intake prediction, but not behavior identification. P1 was recommended as the proper monitoring site, and the results of this study provide a reference for the further development of a wearable device equipped with accelerometers to measure feeding behavior and to predict feed intake.

APA, Harvard, Vancouver, ISO, and other styles

33

Alghamdi, Ali A. A. "Machine Learning for Predicting Neutron Effective Dose." Applied Sciences 14, no. 13 (2024): 5740. http://dx.doi.org/10.3390/app14135740.

Full text

Abstract:

The calculation of effective doses is crucial in many medical and radiation fields in order to ensure safety and compliance with regulatory limits. Traditionally, Monte Carlo codes using detailed human body computational phantoms have been used for such calculations. Monte Carlo dose calculations can be time-consuming and require expertise in different processes when building the computational phantom and dose calculations. This study employs various machine learning (ML) algorithms to predict the organ doses and effective dose conversion coefficients (DCCs) from different anthropomorphic phantoms. A comprehensive data set comprising neutron energy bins, organ labels, masses, and densities is compiled from Monte Carlo studies, and it is used to train and evaluate the supervised ML models. This study includes a broad range of phantoms, including those from the International Commission on Radiation Protection (ICRP-110, ICRP-116 phantom), the Visible-Human Project (VIP-man phantom), and the Medical Internal Radiation Dose Committee (MIRD-Phantom), with row data prepared using numerical data and organ categorical labeled data. Extreme gradient boosting (XGB), gradient boosting (GB), and the random forest-based Extra Trees regressor are employed to assess the performance of the ML models against published ICRP neutron DCC values using the mean square error, mean absolute error, and R2 metrics. The results demonstrate that the ML predictions significantly vary in lower energy ranges and vary less in higher neutron energy ranges while showing good agreement with ICRP values at mid-range energies. Moreover, the categorical data models align closely with the reference doses, suggesting the potential of ML in predicting effective doses for custom phantoms based on regional populations, such as the Saudi voxel-based model. This study paves the way for efficient dose prediction using ML, particularly in scenarios requiring rapid results without extensive computational resources or expertise. The findings also indicate potential improvements in data representation and the inclusion of larger data sets to refine model accuracy and prevent overfitting. Thus, ML methods can serve as valuable techniques for the continued development of personalized dosimetry.

APA, Harvard, Vancouver, ISO, and other styles

34

Mehraein, Mojtaba, Aadhityaa Mohanavelu, Sujay Raghavendra Naganna, Christoph Kulls, and Ozgur Kisi. "Monthly Streamflow Prediction by Metaheuristic Regression Approaches Considering Satellite Precipitation Data." Water 14, no. 22 (2022): 3636. http://dx.doi.org/10.3390/w14223636.

Full text

Abstract:

In this study, the viability of three metaheuristic regression techniques, CatBoost (CB), random forest (RF) and extreme gradient tree boosting (XGBoost, XGB), is investigated for the prediction of monthly streamflow considering satellite precipitation data. Monthly streamflow data from three measuring stations in Turkey and satellite rainfall data derived from Tropical Rainfall Measuring Mission (TRMM) were used as inputs to the models to predict 1 month ahead streamflow. Such predictions are crucial for decision-making in water resource planning and management associated with water allocations, water market planning, restricting water supply and managing drought. The outcomes of the metaheuristic regression methods were compared with those of artificial neural networks (ANN) and nonlinear regression (NLR). The effect of the periodicity component was also investigated by importing the month number of the streamflow data as input. In the first part of the study, the streamflow at each station was predicted using CB, RF, XGB, ANN and NLR methods and considering TRMM data. In the second part, streamflow at the downstream station was predicted using data from upstream stations. In both parts, the CB and XGB methods generally provided similar accuracy and performed superior to the RF, ANN and NLR methods. It was observed that the use of TRMM rainfall data and the periodicity component considerably improved the efficiency of the metaheuristic regression methods in modeling (prediction) streamflow. The use of TRMM data as inputs improved the root mean square error (RMSE) of CB, RF and XGB by 36%, 31% and 24%, respectively, on average, while the corresponding values were 37%, 18% and 43% after introducing periodicity information into the model’s inputs.

APA, Harvard, Vancouver, ISO, and other styles

35

Margoum, Safae, Bekkay Hajji, Stefano Aneli, Giuseppe Marco Tina, and Antonio Gagliano. "Optimizing Nanofluid Hybrid Solar Collectors through Artificial Intelligence Models." Energies 17, no. 10 (2024): 2307. http://dx.doi.org/10.3390/en17102307.

Full text

Abstract:

This study systematically explores and compares the performance of various artificial-intelligence (AI)-based models to predict the electrical and thermal efficiency of photovoltaic–thermal systems (PVTs) cooled by nanofluids. Employing extreme gradient boosting (XGB), extra tree regression (ETR), and k-nearest-neighbor (KNN) regression models, their accuracy is quantitatively evaluated, and their effectiveness measured. The results demonstrate that both XGB and ETR models consistently outperform KNN in accurately predicting both electrical and thermal efficiency. Specifically, the XGB model achieves remarkable correlation coefficient (R2) values of approximately 0.99999, signifying its superior predictive capabilities. Notably, the XGB model exhibits a slightly superior performance compared to ETR in estimating electrical efficiency. Furthermore, when predicting thermal efficiency, both XGB and ETR models demonstrate excellence, with the XGB model showing a slight edge based on R2 values. Validation against new data points reveals outstanding predictive performance, with the XGB model attaining R2 values of 0.99997 for electrical efficiency and 0.99995 for thermal efficiency. These quantitative findings underscore the accuracy and reliability of the XGB and ETR models in predicting the electrical and thermal efficiency of PVT systems when cooled by nanofluids. The study’s implications are significant for PVT system designers and industry professionals, as the incorporation of AI-based models offers improved accuracy, faster prediction times, and the ability to handle large datasets. The models presented in this study contribute to system optimization, performance evaluation, and decision-making in the field. Additionally, robust validation against new data enhances the credibility of these models, advancing the overall understanding and applicability of AI in PVT systems.

APA, Harvard, Vancouver, ISO, and other styles

36

Nguyen, Van-Hai, Tien-Thinh Le, Hoanh-Son Truong, et al. "Applying Bayesian Optimization for Machine Learning Models in Predicting the Surface Roughness in Single-Point Diamond Turning Polycarbonate." Mathematical Problems in Engineering 2021 (June 17, 2021): 1–16. http://dx.doi.org/10.1155/2021/6815802.

Full text

Abstract:

This paper deals with the prediction of surface roughness in manufacturing polycarbonate (PC) by applying Bayesian optimization for machine learning models. The input variables of ultraprecision turning—namely, feed rate, depth of cut, spindle speed, and vibration of the X-, Y-, and Z-axis—are the main factors affecting surface quality. In this research, six machine learning- (ML-) based models—artificial neural network (ANN), Cat Boost Regression (CAT), Support Vector Machine (SVR), Gradient Boosting Regression (GBR), Decision Tree Regression (DTR), and Extreme Gradient Boosting Regression (XGB)—were applied to predict the surface roughness (Ra). The predictive performance of the baseline models was quantitatively assessed through error metrics: root means square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). The overall results indicate that the XGB and CAT models predict Ra with the greatest accuracy. In improving baseline models such as XGB and CAT, the Bayesian optimization (BO) is next used to determine their best hyperparameters, and the results indicate that XGB is the best model according to the evaluation metrics. Results have shown that the performance of the models has been improved significantly with BO. For example, the values of RMSE and MAE of XGB have decreased from 0.0076 to 0.0047 and from 0.0063 to 0.0027, respectively, for the training dataset. Using the testing dataset, the values of RMSE and MAE of XGB have decreased from 0.4033 to 0.2512 and from 0.2845 to 0.2225, respectively. Moreover, the vibrations of the X, Y, and Z axes and feed rate are the most significant feature in predicting the results, which is in high accordance with the literature. We find that, in a specified value domain, the vibration of the axes has a greater influence on the surface quality than does the cutting condition.

APA, Harvard, Vancouver, ISO, and other styles

37

Pane, Syafrial Fachri, Rofi Nafiis Zain, Iwan Setiawan, and Virdiandry Putratama. "Predicting the Happiness Index Based on the HDI Indicator in Indonesia Using the Ensemble Learning Approach." NUANSA INFORMATIKA 19, no. 2 (2025): 105–14. https://doi.org/10.25134/ilkom.v19i2.410.

Full text

Abstract:

Machine Learning is used to analyze complex data in various fields of research. In this study, we applied an ensemble learning approach consisting of Random Forest Regression (RF), XGBoost Regression (XGB), Decision Tree Regression (DT) and Pearson correlation analysis as well as Shapley Additive Explanations (SHAP) to analyze the relationship between the HDI and Happiness indicators in Indonesia. Second, building a prediction model with an ensemble learning approach, namely stacking, which consists of several algorithms including RF, XGB, DT. The results of this study, one, based on the results of Pearson correlation analysis, Permutation Importance (PI), and SHAP, show that the happiness score of Indonesian people has a strong correlation with the Human Development Index variable. The Pearson correlation result shows a value of 0.88, which indicates a very strong positive relationship between HDI and happiness. In addition, the Permutation Importance and SHAP analysis also confirms that HDI is one of the most influential variables in predicting happiness scores in Indonesia. Second, the performance model for predicting happiness using stacking regressors with an R-Squared value of 97.68\%, MAE 0.002900, MSE 0.000021, and RMSE 0.004604.

APA, Harvard, Vancouver, ISO, and other styles

38

Guo, Chao-Yu, and Ke-Hao Chang. "A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine." International Journal of Environmental Research and Public Health 19, no. 4 (2022): 2338. http://dx.doi.org/10.3390/ijerph19042338.

Full text

Abstract:

Recent studies have revealed the importance of the interaction effect in cardiac research. An analysis would lead to an erroneous conclusion when the approach failed to tackle a significant interaction. Regression models deal with interaction by adding the product of the two interactive variables. Thus, statistical methods could evaluate the significance and contribution of the interaction term. However, machine learning strategies could not provide the p-value of specific feature interaction. Therefore, we propose a novel machine learning algorithm to assess the p-value of a feature interaction, named the extreme gradient boosting machine for feature interaction (XGB-FI). The first step incorporates the concept of statistical methodology by stratifying the original data into four subgroups according to the two interactive features. The second step builds four XGB machines with cross-validation techniques to avoid overfitting. The third step calculates a newly defined feature interaction ratio (FIR) for all possible combinations of predictors. Finally, we calculate the empirical p-value according to the FIR distribution. Computer simulation studies compared the XGB-FI with the multiple regression model with an interaction term. The results showed that the type I error of XGB-FI is valid under the nominal level of 0.05 when there is no interaction effect. The power of XGB-FI is consistently higher than the multiple regression model in all scenarios we examined. In conclusion, the new machine learning algorithm outperforms the conventional statistical model when searching for an interaction.

APA, Harvard, Vancouver, ISO, and other styles

39

Kupin, A., V. Lyashenko, V. Holiver, and Y. Sherstnev. "FORECASTING THE LOAD ON THE ELECTRIC GRID OF A MINING AND PROCESSING PLANT." Sciences of Europe, no. 151 (October 27, 2024): 95–104. https://doi.org/10.5281/zenodo.13999038.

Full text

Abstract:

The relevance of the work is due to the need for a comprehensive study of the power supply system of the efficiency of managing the consumption of electrical energy based on predicting the load on the power grid of concentrating plants. The accuracy of its forecast will significantly simplify the process of making managerial decisions in strategic and operational planning and minimize the cost of electricity for the production of a unit of output (ore concentrate). <strong>Purpose:</strong> substantiation of the efficiency of electricity consumption in the enrichment and processing of ores based on the management and forecasting of the load on the electrical network of concentrating plants. This will make it possible to use industrial energy storage devices at the right time to reduce the consumption from the network during peak hours (when using a two-rate tariff) or in the peak zone of the day (when using a differentiated tariff for electrical energy). <strong>Objects:</strong> enrichment plants, which are characterized by the complexity of determining the energy characteristics of process equipment and effective management of electricity consumption based on predicting the load on the power grid. <strong>Methods:</strong> methods of complex generalization, analysis and evaluation of practical experience and scientific achievements in the field of managerial decision-making in strategic and operational planning and minimization of electricity costs for the production of a unit of output were used. In particular, the analysis of literary sources was carried out; used the method of theoretical generalizations with the use of mathematical statistics, physical and mathematical modeling; calculations and feasibility studies, laboratory and full-scale experimental studies, as well as industrial tests in the conditions of operating enterprises according to standard and new methods with the participation of the authors. <strong>Results.</strong> On the basis of experimental studies, the factors influencing the consumption of electrical energy and to be included in the developed machine learning model were determined, which will significantly simplify the process of making managerial decisions in the strategic and operational planning of electrical energy costs and minimize the cost of electricity for the production of a unit of output. It has been established that the important features are the ambient temperature, the correlation coefficient of which with the target variable - the total uncontrolled load - is 0.17, humidity (correlation coefficient was -0.22), weekend/working day sign (correlation coefficient -0.33), day of the week (correlation coefficient -0.38), time data (correlation coefficient for time 0.19, for month and year < 0.068). A machine learning model based on the extreme gradient boosting regression model (XGB Regressor) is proposed, which makes it possible to obtain the most reliable forecast of the load on the concentrator's power grid. The average absolute prediction error of these models was 3.51, and the coefficient of determination was 0.84 and 0.87 for the training and test samples, respectively. The results of the study can be adapted for other objects. It was noted that the modeling of the work of fuzzy intelligent control system (ISC) showed potential opportunities for reducing the total energy consumption by 15-35% in the modes of additional hydroaccumulating EE generation (two-rate tariff was studied) in the conditions of various enterprises involved in the extraction and beneficiation of iron ore raw materials.

APA, Harvard, Vancouver, ISO, and other styles

40

Kim, Ki Hong, Jeong Ho Park, Young Sun Ro, Ki Jeong Hong, Kyoung Jun Song, and Sang Do Shin. "Emergency department routine data and the diagnosis of acute ischemic heart disease in patients with atypical chest pain." PLOS ONE 15, no. 11 (2020): e0241920. http://dx.doi.org/10.1371/journal.pone.0241920.

Full text

Abstract:

Background Due to an aging population and the increasing proportion of patients with various comorbidities, the number of patients with acute ischemic heart disease (AIHD) who present to the emergency department (ED) with atypical chest pain is increasing. The aim of this study was to develop and validate a prediction model for AIHD in patients with atypical chest pain. Methods and results A chest pain workup registry, ED administrative database, and clinical data warehouse database were analyzed and integrated by using nonidentifiable key factors to create a comprehensive clinical dataset in a single academic ED from 2014 to 2018. Demographic findings, vital signs, and routine laboratory test results were assessed for their ability to predict AIHD. An extreme gradient boosting (XGB) model was developed and evaluated, and its performance was compared to that of a single-variable model and logistic regression model. The area under the receiver operating characteristic curve (AUROC) was calculated to assess discrimination. A calibration plot and partial dependence plots were also used in the analyses. Overall, 4,978 patients were analyzed. Of the 3,833 patients in the training cohort, 453 (11.8%) had AIHD; of the 1,145 patients in the validation cohort, 166 (14.5%) had AIHD. XGB, troponin (single-variable), and logistic regression models showed similar discrimination power (AUROC [95% confidence interval]: XGB model, 0.75 [0.71–0.79]; troponin model, 0.73 [0.69–0.77]; logistic regression model, 0.73 [0.70–0.79]). Most patients were classified as non-AIHD; calibration was good in patients with a low predicted probability of AIHD in all prediction models. Unlike in the logistic regression model, a nonlinear relationship-like threshold and U-shaped relationship between variables and the probability of AIHD were revealed in the XGB model. Conclusion We developed and validated an AIHD prediction model for patients with atypical chest pain by using an XGB model.

APA, Harvard, Vancouver, ISO, and other styles

41

Xu, Wenhao, Xiaogang Liu, Jianhua Dong, et al. "Improvement of Citrus Yield Prediction Using UAV Multispectral Images and the CPSO Algorithm." Agronomy 15, no. 1 (2025): 171. https://doi.org/10.3390/agronomy15010171.

Full text

Abstract:

Achieving timely and non-destructive assessments of crop yields is a key challenge in the agricultural field, as it is important for optimizing field management measures and improving crop productivity. To accurately and quickly predict citrus yield, this study obtained multispectral images of citrus fruit maturity through an unmanned aerial vehicle (UAV) and extracted multispectral vegetation indices (VIs) and texture features (T) from the images as feature variables. Extreme gradient boosting (XGB), random forest (RF), support vector machine (SVM), gaussian process regression (GPR), and multiple stepwise regression (MSR) models were used to construct citrus fruit number and quality prediction models. The results show that, for fruit number prediction, the XGB model performed best under the combined input of VIs and T, with an R2 = 0.792 and an RMSE = 462 fruits. However, for fruit quality prediction, the RF model performed best when only the VIs were used, with an R2 = 0.787 and an RMSE = 20.0 kg. Although the model accuracy was acceptable, the number of input feature variables used was large. To further improve the model prediction performance, we explored a method that utilizes a hybrid coding particle swarm optimization algorithm (CPSO) coupled with XGB and SVM models. The coupled models had a significant improvement in predicting the number and quality of citrus fruits, especially the model of CPSO coupled with XGB (CPSO-XGB). The CPSO-XGB model had fewer input features and higher accuracy, with an R2 > 0.85. Finally, the Shapley additive explanations (SHAP) method was used to reveal the importance of the normalized difference chlorophyll index (NDCI) and the red band mean feature (MEA_R) when constructing the prediction model. The results of this study provide an application reference and a theoretical basis for the research on UAV remote sensing in relation to citrus yield.

APA, Harvard, Vancouver, ISO, and other styles

42

Xu, Chan, Wencai Liu, Chengliang Yin, et al. "Establishment and Validation of a Machine Learning Prediction Model Based on Big Data for Predicting the Risk of Bone Metastasis in Renal Cell Carcinoma Patients." Computational and Mathematical Methods in Medicine 2022 (October 3, 2022): 1–8. http://dx.doi.org/10.1155/2022/5676570.

Full text

Abstract:

Purpose. Since the prognosis of renal cell carcinoma (RCC) patients with bone metastasis (BM) is poor, this study is aimed at using big data to build a machine learning (ML) model to predict the risk of BM in RCC patients. Methods. A retrospective study was conducted on 40,355 RCC patients in the SEER database from 2010 to 2017. LASSO regression and multivariate logistic regression analysis was performed to determine independent risk factors of RCC-BM. Six ML algorithm models, including LR, GBM, XGB, RF, DT, and NBC, were used to establish risk models for predicting RCC-BM. The prediction performance of ML models was weighed by 10-fold cross-validation. Results. The study investigated 40,355 patients diagnosed with RCC in the SEER database, where 1,811 (4.5%) were BM patients. Independent risk factors for BM were tumor grade, T stage, N stage, liver metastasis, lung metastasis, and brain metastasis. Among the RCC-BM risk prediction models established by six ML algorithms, the XGB model showed the best prediction performance ( AUC = 0.891 ). Therefore, a network calculator based on the XGB model was established to individually assess the risk of BM in patients with RCC. Conclusion. The XGB risk prediction model based on the ML algorithm performed a good prediction effect on BM in RCC patients.

APA, Harvard, Vancouver, ISO, and other styles

43

Yin, Jing-Mei, Yang Li, Jun-Tang Xue, Guo-Wei Zong, Zhong-Ze Fang, and Lang Zou. "Explainable Machine Learning-Based Prediction Model for Diabetic Nephropathy." Journal of Diabetes Research 2024 (January 20, 2024): 1–13. http://dx.doi.org/10.1155/2024/8857453.

Full text

Abstract:

The aim of this study is to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach. The dataset consists of 548 patients from April 2018 to April 2019 in the Second Affiliated Hospital of Dalian Medical University (SAHDMU). We select the optimal 38 features through a least absolute shrinkage and selection operator (LASSO) regression model and a 10-fold cross-validation. We compare four machine learning algorithms, including extreme gradient boosting (XGB), random forest, decision tree, and logistic regression, by AUC-ROC curves, decision curves, and calibration curves. We quantify feature importance and interaction effects in the optimal predictive model by Shapley additive explanation (SHAP) method. The XGB model has the best performance to screen for DN with the highest AUC value of 0.966. The XGB model also gains more clinical net benefits than others, and the fitting degree is better. In addition, there are significant interactions between serum metabolites and duration of diabetes. We develop a predictive model by XGB algorithm to screen for DN. C2, C5DC, Tyr, Ser, Met, C24, C4DC, and Cys have great contribution in the model and can possibly be biomarkers for DN.

APA, Harvard, Vancouver, ISO, and other styles

44

Deshpand, Sujit, Deipali Gore, Amrapali Chavan, and Arundhati Nelli. "An adaptive machine learning model for IoT security to detect spoofing and DOS attacks." International Journal on Information Technologies and Security 17, no. 1 (2025): 79–90. https://doi.org/10.59035/mzgi5749.

Full text

Abstract:

The rising number of Internet of Things (IoT) devices necessitates robust security to counter threats like Denial-of-Service (DoS) and spoofing attacks. Many Intrusion Detection Systems (IDS) leverage Machine Learning (ML), but challenges like concept drift and class imbalance persist. This study presents an adaptive ML-based IDS framework using a Modified XGBoost (Mod-XGB) model. Mod-XGB incorporates a weighted loss function for class imbalance, adaptive instance weighting for concept drift, and a security penalty term to enhance feature selection. Evaluated on the CICIoV2024 dataset, Mod-XGB achieved 97.96% accuracy, outperforming models like Random Forest, Logistic Regression, and AdaBoost. This innovative approach effectively addresses IoT security challenges, offering significant advancements in IDS performance for improved attack detection and resilience in dynamic environments.

APA, Harvard, Vancouver, ISO, and other styles

45

Sun, Fei, Fang Fang, Run Wang, et al. "An Impartial Semi-Supervised Learning Strategy for Imbalanced Classification on VHR Images." Sensors 20, no. 22 (2020): 6699. http://dx.doi.org/10.3390/s20226699.

Full text

Abstract:

Imbalanced learning is a common problem in remote sensing imagery-based land-use and land-cover classifications. Imbalanced learning can lead to a reduction in classification accuracy and even the omission of the minority class. In this paper, an impartial semi-supervised learning strategy based on extreme gradient boosting (ISS-XGB) is proposed to classify very high resolution (VHR) images with imbalanced data. ISS-XGB solves multi-class classification by using several semi-supervised classifiers. It first employs multi-group unlabeled data to eliminate the imbalance of training samples and then utilizes gradient boosting-based regression to simulate the target classes with positive and unlabeled samples. In this study, experiments were conducted on eight study areas with different imbalanced situations. The results showed that ISS-XGB provided a comparable but more stable performance than most commonly used classification approaches (i.e., random forest (RF), XGB, multilayer perceptron (MLP), and support vector machine (SVM)), positive and unlabeled learning (PU-Learning) methods (PU-BP and PU-SVM), and typical synthetic sample-based imbalanced learning methods. Especially under extremely imbalanced situations, ISS-XGB can provide high accuracy for the minority class without losing overall performance (the average overall accuracy achieves 85.92%). The proposed strategy has great potential in solving the imbalanced classification problems in remote sensing.

APA, Harvard, Vancouver, ISO, and other styles

46

Cahya Putri Buani, Duwi, and Nia Nuraeni. "Application of XGB Classifier for Obesity Rate Prediction." Jurnal Riset Informatika 6, no. 1 (2023): 1–6. http://dx.doi.org/10.34288/jri.v6i1.260.

Full text

Abstract:

According to the Ministry of Health, the percentage of the population in Indonesia who are overweight is 13.5% for adults aged 18 years and over, while 28.7% are obese with BMI>=25 and obese with BMI>=27 as much as 15.4%. Meanwhile, at the age of children 5-12 years, 18.8% were overweight and 10.8% were obese. From these data, early detection of obesity levels is needed. From these data, prevention is needed so that the percentage of the population who experience obsediness can decrease, one of the efforts that can be done is to do early detection of obesity, to do early detection of obesity can be done using Machine Learning. In this study, it was discussed about the prediction of obestias levels using 7 (seven) models, namely Naive Bayes (NB), Random Forest (RF), K-NN, Decision Tree Classifier (DTC), SVM, XGB Classifier (XGB), Logistic Regression (LR) from the seven models used to predict the obesity level of XGB Classifier (XGB) which has the highest accuracy, namely Accurasy 0.96, with an f1-score of 0.96, Precission and recall 0.96.

APA, Harvard, Vancouver, ISO, and other styles

47

Li, Yujie, Zhongmin Liang, Yiming Hu, Binquan Li, Bin Xu, and Dong Wang. "A multi-model integration method for monthly streamflow prediction: modified stacking ensemble strategy." Journal of Hydroinformatics 22, no. 2 (2019): 310–26. http://dx.doi.org/10.2166/hydro.2019.066.

Full text

Abstract:

Abstract In this study, we evaluate elastic net regression (ENR), support vector regression (SVR), random forest (RF) and eXtreme Gradient Boosting (XGB) models and propose a modified multi-model integration method named a modified stacking ensemble strategy (MSES) for monthly streamflow forecasting. We apply the above methods to the Three Gorges Reservoir in the Yangtze River Basin, and the results show the following: (1) RF and XGB present better and more stable forecast performance than ENR and SVR. It can be concluded that the machine learning-based models have the potential for monthly streamflow forecasting. (2) The MSES can effectively reconstruct the original training data in the first layer and optimize the XGB model in the second layer, improving the forecast performance. We believe that the MSES is a computing framework worthy of development, with simple mathematical structure and low computational cost. (3) The forecast performance mainly depends on the size and distribution characteristics of the monthly streamflow sequence, which is still difficult to predict using only climate indices.

APA, Harvard, Vancouver, ISO, and other styles

48

Islam, Md Merajul, Md Jahangir Alam, Md Maniruzzaman, et al. "Predicting the risk of hypertension using machine learning algorithms: A cross sectional study in Ethiopia." PLOS ONE 18, no. 8 (2023): e0289613. http://dx.doi.org/10.1371/journal.pone.0289613.

Full text

Abstract:

Background and objectives Hypertension (HTN), a major global health concern, is a leading cause of cardiovascular disease, premature death and disability, worldwide. It is important to develop an automated system to diagnose HTN at an early stage. Therefore, this study devised a machine learning (ML) system for predicting patients with the risk of developing HTN in Ethiopia. Materials and methods The HTN data was taken from Ethiopia, which included 612 respondents with 27 factors. We employed Boruta-based feature selection method to identify the important risk factors of HTN. The four well-known models [logistics regression, artificial neural network, random forest, and extreme gradient boosting (XGB)] were developed to predict HTN patients on the training set using the selected risk factors. The performances of the models were evaluated by accuracy, precision, recall, F1-score, and area under the curve (AUC) on the testing set. Additionally, the SHapley Additive exPlanations (SHAP) method is one of the explainable artificial intelligences (XAI) methods, was used to investigate the associated predictive risk factors of HTN. Results The overall prevalence of HTN patients is 21.2%. This study showed that XGB-based model was the most appropriate model for predicting patients with the risk of HTN and achieved the accuracy of 88.81%, precision of 89.62%, recall of 97.04%, F1-score of 93.18%, and AUC of 0. 894. The XBG with SHAP analysis reveal that age, weight, fat, income, body mass index, diabetes mulitas, salt, history of HTN, drinking, and smoking were the associated risk factors of developing HTN. Conclusions The proposed framework provides an effective tool for accurately predicting individuals in Ethiopia who are at risk for developing HTN at an early stage and may help with early prevention and individualized treatment.

APA, Harvard, Vancouver, ISO, and other styles

49

Amirullah, Afif, Umi Laili Yuhana, and Muhammad Alfian. "Improve Software Defect Prediction using Particle Swarm Optimization and Synthetic Minority Over-sampling Technique." Scientific Journal of Informatics 11, no. 4 (2025): 1127–36. https://doi.org/10.15294/sji.v11i4.16808.

Full text

Abstract:

Purpose: Early detection of software defects is essential to prevent problems with software maintenance. Although much machine learning research has been used to predict software defects, most have not paid attention to the problems of data imbalance and feature correlation. This research focuses on overcoming the problems of imbalance dataset. It provides new insights into the impact of these two feature extraction techniques in improving the accuracy of software defect prediction. Methods: This research compares three algorithms: Random Forest, Logistic Regression, and XGBoost, with the application of PSO for feature selection and SMOTE to overcome the problem of imbalanced data. Comparison of algorithm performance is measured using F1-Score, Precision, Recall, and Accuracy metrics to evaluate the effectiveness of each approach. Result: This research demonstrates the potential of SMOTE and PSO techniques in enhancing the performance of software defect detection models, particularly in ensemble algorithms like Random Forest (RF) and XGBoost (XGB). The application of SMOTE and PSO resulted in a significant increase in RF accuracy to 87.63%, XGB to 85.40%, but a decrease in Logistic Regression (LR) accuracy to 72.98%. The F1-Score, Precision, and Recall metrics showed substantial improvements in RF and XGB, but not in LR due to the decrease in accuracy, highlighting the impact of the research findings. Novelty: Based on the comparison results, it is proven that the SMOTE and PSO algorithms can improve the Random Forest and XGB models for predicting software defect.

APA, Harvard, Vancouver, ISO, and other styles

50

Zhou, Ke, Hailei Liu, Xiaobo Deng, Hao Wang, and Shenglan Zhang. "Comparison of Machine-Learning Algorithms for Near-Surface Air-Temperature Estimation from FY-4A AGRI Data." Advances in Meteorology 2020 (October 6, 2020): 1–14. http://dx.doi.org/10.1155/2020/8887364.

Full text

Abstract:

Six machine-learning approaches, including multivariate linear regression (MLR), gradient boosting decision tree, k-nearest neighbors, random forest, extreme gradient boosting (XGB), and deep neural network (DNN), were compared for near-surface air-temperature (Tair) estimation from the new generation of Chinese geostationary meteorological satellite Fengyun-4A (FY-4A) observations. The brightness temperatures in split-window channels from the Advanced Geostationary Radiation Imager (AGRI) of FY-4A and numerical weather prediction data from the global forecast system were used as the predictor variables for Tair estimation. The performance of each model and the temporal and spatial distribution of the estimated Tair errors were analyzed. The results showed that the XGB model had better overall performance, with R2 of 0.902, bias of −0.087°C, and root-mean-square error of 1.946°C. The spatial variation characteristics of the Tair error of the XGB method were less obvious than those of the other methods. The XGB model can provide more stable and high-precision Tair for a large-scale Tair estimation over China and can serve as a reference for Tair estimation based on machine-learning models.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!