Log in

Relevant bibliographies by topics / SHAP values / Journal articles

To see the other types of publications on this topic, follow the link: SHAP values.

Journal articles on the topic 'SHAP values'

Author: Grafiati

Published: 7 June 2025

Last updated: 26 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'SHAP values.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zern, Artjom, Klaus Broelemann, and Gjergji Kasneci. "Interventional SHAP Values and Interaction Values for Piecewise Linear Regression Trees." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (2023): 11164–73. http://dx.doi.org/10.1609/aaai.v37i9.26322.

Full text

Abstract:

In recent years, game-theoretic Shapley values have gained increasing attention with respect to local model explanation by feature attributions. While the approach using Shapley values is model-independent, their (exact) computation is usually intractable, so efficient model-specific algorithms have been devised including approaches for decision trees or their ensembles in general. Our work goes further in this direction by extending the interventional TreeSHAP algorithm to piecewise linear regression trees, which gained more attention in the past few years. To this end, we introduce a decomposition of the contribution function based on decision paths, which allows a more comprehensible formulation of SHAP algorithms for tree-based models. Our algorithm can also be readily applied to computing SHAP interaction values of these models. In particular, as the main contribution of this paper, we provide a more efficient approach of interventional SHAP for tree-based models by precomputing statistics of the background data based on the tree structure.

APA, Harvard, Vancouver, ISO, and other styles

2

Matthews, Spencer, and Brian Hartman. "mSHAP: SHAP Values for Two-Part Models." Risks 10, no. 1 (2021): 3. http://dx.doi.org/10.3390/risks10010003.

Full text

Abstract:

Two-part models are important to and used throughout insurance and actuarial science. Since insurance is required for registering a car, obtaining a mortgage, and participating in certain businesses, it is especially important that the models that price insurance policies are fair and non-discriminatory. Black box models can make it very difficult to know which covariates are influencing the results, resulting in model risk and bias. SHAP (SHapley Additive exPlanations) values enable interpretation of various black box models, but little progress has been made in two-part models. In this paper, we propose mSHAP (or multiplicative SHAP), a method for computing SHAP values of two-part models using the SHAP values of the individual models. This method will allow for the predictions of two-part models to be explained at an individual observation level. After developing mSHAP, we perform an in-depth simulation study. Although the kernelSHAP algorithm is also capable of computing approximate SHAP values for a two-part model, a comparison with our method demonstrates that mSHAP is exponentially faster. Ultimately, we apply mSHAP to a two-part ratemaking model for personal auto property damage insurance coverage. Additionally, an R package (mshap) is available to easily implement the method in a wide variety of applications.

APA, Harvard, Vancouver, ISO, and other styles

3

Utkin, Lev, and Andrei Konstantinov. "Ensembles of Random SHAPs." Algorithms 15, no. 11 (2022): 431. http://dx.doi.org/10.3390/a15110431.

Full text

Abstract:

The ensemble-based modifications of the well-known SHapley Additive exPlanations (SHAP) method for the local explanation of a black-box model are proposed. The modifications aim to simplify the SHAP which is computationally expensive when there is a large number of features. The main idea behind the proposed modifications is to approximate the SHAP by an ensemble of SHAPs with a smaller number of features. According to the first modification, called the ER-SHAP, several features are randomly selected many times from the feature set, and the Shapley values for the features are computed by means of “small” SHAPs. The explanation results are averaged to obtain the final Shapley values. According to the second modification, called the ERW-SHAP, several points are generated around the explained instance for diversity purposes, and the results of their explanation are combined with weights depending on the distances between the points and the explained instance. The third modification, called the ER-SHAP-RF, uses the random forest for a preliminary explanation of the instances and determines a feature probability distribution which is applied to the selection of the features in the ensemble-based procedure of the ER-SHAP. Many numerical experiments illustrating the proposed modifications demonstrate their efficiency and properties for a local explanation.

APA, Harvard, Vancouver, ISO, and other styles

4

Sharipov, D. K., and A. D. Saidov. "Modified SHAP approach for interpretable prediction of cardiovascular complications." Проблемы вычислительной и прикладной математики, no. 2(64) (May 15, 2025): 114–22. https://doi.org/10.71310/pcam.2_64.2025.10.

Full text

Abstract:

This article explores the significance of modifying SHAP (SHapley Additive exPlana tions) values to enhance model interpretability in machine learning. SHAP values provide a fair attribution of feature contributions, making AI-driven decision-making more trans parent and reliable. However, raw SHAP values can sometimes be difficult to interpret due to feature interactions, noise, and inconsistencies in scale. The article discusses key techniques for modifying SHAP values, including feature aggregation, normalization, cus tom weighting, and noise reduction, to improve clarity and relevance in explanations. It also examines how these modifications align interpretations with real-world needs, ensur ing that SHAP-based insights remain practical and actionable. By strategically refining SHAP values, data scientists can derive more meaningful explanations, improving trust in AI models and enhancing decision-making processes. The article provides a structured approach to modifying SHAP values, offering practical applications and benefits across various domains.

APA, Harvard, Vancouver, ISO, and other styles

5

Létoffé, Olivier, Xuanxiang Huang, and Joao Marques-Silva. "Towards Trustable SHAP Scores." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 17 (2025): 18198–208. https://doi.org/10.1609/aaai.v39i17.34002.

Full text

Abstract:

SHAP scores represent the proposed use of the well-known Shapley values in eXplainable Artificial Intelligence (XAI). Recent work has shown that the exact computation of SHAP scores can produce unsatisfactory results. Concretely, for some ML models, SHAP scores will mislead with respect to relative feature influence. To address these limitations, recently proposed alternatives exploit different axiomatic aggregations, all of which are defined in terms of abductive explanations. However, the proposed axiomatic aggregations are not Shapley values. This paper investigates how SHAP scores can be modified so as to extend axiomatic aggregations to the case of Shapley values in XAI. More importantly, the proposed new definition of SHAP scores avoids all the known cases where unsatisfactory results have been identified. The paper also characterizes the complexity of computing the novel definition of SHAP scores, highlighting families of classifiers for which computing these scores is tractable. Furthermore, the paper proposes modifications to the existing implementations of SHAP scores. These modifications eliminate some of the known limitations of SHAP scores, and have negligible impact in terms of performance.

APA, Harvard, Vancouver, ISO, and other styles

6

Suresh, Tamilarasi, Assegie Tsehay Admassu, Sangeetha Ganesan, Tulasi Ravulapalli Lakshmi, Radha Mothukuri, and Salau Ayodeji Olalekan. "Explainable extreme boosting model for breast cancer diagnosis." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 5 (2023): 5764–69. https://doi.org/10.11591/ijece.v13i5.pp5764-5769.

Full text

Abstract:

This study investigates the Shapley additive explanation (SHAP) of the extreme boosting (XGBoost) model for breast cancer diagnosis. The study employed Wisconsin’s breast cancer dataset, characterized by 30 features extracted from an image of a breast cell. SHAP module generated different explainer values representing the impact of a breast cancer feature on breast cancer diagnosis. The experiment computed SHAP values of 569 samples of the breast cancer dataset. The SHAP explanation indicates perimeter and concave points have the highest impact on breast cancer diagnosis. SHAP explains the XGB model diagnosis outcome showing the features affecting the XGBoost model. The developed XGB model achieves an accuracy of 98.42%.

APA, Harvard, Vancouver, ISO, and other styles

7

Suresh, Tamilarasi, Tsehay Admassu Assegie, Sangeetha Ganesan, Ravulapalli Lakshmi Tulasi, Radha Mothukuri, and Ayodeji Olalekan Salau. "Explainable extreme boosting model for breast cancer diagnosis." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 5 (2023): 5764. http://dx.doi.org/10.11591/ijece.v13i5.pp5764-5769.

Full text

Abstract:

<span lang="EN-US">This study investigates the Shapley additive explanation (SHAP) of the extreme boosting (XGBoost) model for breast cancer diagnosis. The study employed Wisconsin’s breast cancer dataset, characterized by 30 features extracted from an image of a breast cell. SHAP module generated different explainer values representing the impact of a breast cancer feature on breast cancer diagnosis. The experiment computed SHAP values of 569 samples of the breast cancer dataset. The SHAP explanation indicates perimeter and concave points have the highest impact on breast cancer diagnosis. SHAP explains the XGB model diagnosis outcome showing the features affecting the XGBoost model. The developed XGB model achieves an accuracy of 98.42%.</span>

APA, Harvard, Vancouver, ISO, and other styles

8

Lamens, Alec, and Jürgen Bajorath. "Explaining Multiclass Compound Activity Predictions Using Counterfactuals and Shapley Values." Molecules 28, no. 14 (2023): 5601. http://dx.doi.org/10.3390/molecules28145601.

Full text

Abstract:

Most machine learning (ML) models produce black box predictions that are difficult, if not impossible, to understand. In pharmaceutical research, black box predictions work against the acceptance of ML models for guiding experimental work. Hence, there is increasing interest in approaches for explainable ML, which is a part of explainable artificial intelligence (XAI), to better understand prediction outcomes. Herein, we have devised a test system for the rationalization of multiclass compound activity prediction models that combines two approaches from XAI for feature relevance or importance analysis, including counterfactuals (CFs) and Shapley additive explanations (SHAP). For compounds with different single- and dual-target activities, we identified small compound modifications that induce feature changes inverting class label predictions. In combination with feature mapping, CFs and SHAP value calculations provide chemically intuitive explanations for model decisions.

APA, Harvard, Vancouver, ISO, and other styles

9

Guo, Yaqiang, Fengying Ma, Peipei Li, et al. "Comprehensive SHAP Values and Single-Cell Sequencing Technology Reveal Key Cell Clusters in Bovine Skeletal Muscle." International Journal of Molecular Sciences 26, no. 5 (2025): 2054. https://doi.org/10.3390/ijms26052054.

Full text

Abstract:

The skeletal muscle of cattle is the main component of their muscular system, responsible for supporting and movement functions. However, there are still many unknown areas regarding the ranking of the importance of different types of cell populations within it. This study conducted in-depth research and made a series of significant findings. First, we trained 15 bovine skeletal muscle models and selected the best-performing model as the initial model. Based on the SHAP (Shapley Additive exPlanations) analysis of this initial model, we obtained the SHAP values of 476 important genes. Using the contributions of these 476 genes, we reconstructed a 476-gene SHAP value matrix, and relying solely on the interactions among these 476 genes, successfully mapped the single-cell atlas of bovine skeletal muscle. After retraining the model and further interpretation, we found that Myofiber cells are the most representative cell type in bovine skeletal muscle, followed by neutrophils. By determining the key genes of each cell type through SHAP values, we conducted analyses on the correlations among key genes and between cells for Myofiber cells, revealing the critical role these genes play in muscle growth and development. Further, by using protein language models, we performed cross-species comparisons between cattle and pigs, deepening our understanding of Myofiber cells as key cells in skeletal muscle, and exploring the common regulatory mechanisms of muscle development across species.

APA, Harvard, Vancouver, ISO, and other styles

10

Baptista, Marcia L., Kai Goebel, and Elsa M. P. Henriques. "Relation between prognostics predictor evaluation metrics and local interpretability SHAP values." Artificial Intelligence 306 (May 2022): 103667. http://dx.doi.org/10.1016/j.artint.2022.103667.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Aymerich, María, Alejandra García-Baizán, Paolo Niccolò Franco, et al. "Radiomics-Based Classification of Clear Cell Renal Cell Carcinoma ISUP Grade: A Machine Learning Approach with SHAP-Enhanced Explainability." Diagnostics 15, no. 11 (2025): 1337. https://doi.org/10.3390/diagnostics15111337.

Full text

Abstract:

Background: Clear cell renal cell carcinoma (ccRCC) is the most common subtype of renal cancer, and its prognosis is closely linked to the International Society of Urological Pathology (ISUP) grade. While histopathological evaluation remains the gold standard for grading, non-invasive methods, such as radiomics, offer potential for automated classification. This study aims to develop a radiomics-based machine learning model for the ISUP grade classification of ccRCC using nephrographic-phase CT images, with an emphasis on model interpretability through SHAP (SHapley Additive exPlanations) values. Objective: To develop and interpret a radiomics-based machine learning model for classifying ISUP grade in clear cell renal cell carcinoma (ccRCC) using nephrographic-phase CT images. Materials and Methods: This retrospective study included 109 patients with histopathologically confirmed ccRCC. Radiomic features were extracted from the nephrographic-phase CT scans. Feature robustness was evaluated via intraclass correlation coefficient (ICC), followed by redundancy reduction using Pearson correlation and minimum Redundancy Maximum Relevance (mRMR). Logistic regression, support vector machine, and random forest classifiers were trained using 8-fold cross-validation. SHAP values were computed to assess feature contribution. Results: The logistic regression model achieved the highest classification performance, with an accuracy of 82% and an AUC of 0.86. SHAP analysis identified major axis length, busyness, and large area emphasis as the most influential features. These variables represented shape and texture information, critical for distinguishing between high and low ISUP grades. Conclusions: A radiomics-based logistic regression model using nephrographic-phase CT enables accurate, non-invasive classification of ccRCC according to ISUP grade. The use of SHAP values enhances model transparency, supporting clinical interpretability and potential adoption in precision oncology.

APA, Harvard, Vancouver, ISO, and other styles

12

Scheda, Riccardo, and Stefano Diciotti. "Explanations of Machine Learning Models in Repeated Nested Cross-Validation: An Application in Age Prediction Using Brain Complexity Features." Applied Sciences 12, no. 13 (2022): 6681. http://dx.doi.org/10.3390/app12136681.

Full text

Abstract:

SHAP (Shapley additive explanations) is a framework for explainable AI that makes explanations locally and globally. In this work, we propose a general method to obtain representative SHAP values within a repeated nested cross-validation procedure and separately for the training and test sets of the different cross-validation rounds to assess the real generalization abilities of the explanations. We applied this method to predict individual age using brain complexity features extracted from MRI scans of 159 healthy subjects. In particular, we used four implementations of the fractal dimension (FD) of the cerebral cortex—a measurement of brain complexity. Representative SHAP values highlighted that the most recent implementation of the FD had the highest impact over the others and was among the top-ranking features for predicting age. SHAP rankings were not the same in the training and test sets, but the top-ranking features were consistent. In conclusion, we propose a method—and share all the source code—that allows a rigorous assessment of the SHAP explanations of a trained model in a repeated nested cross-validation setting.

APA, Harvard, Vancouver, ISO, and other styles

13

Mitchell, Rory, Eibe Frank, and Geoffrey Holmes. "GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles." PeerJ Computer Science 8 (April 5, 2022): e880. http://dx.doi.org/10.7717/peerj-cs.880.

Full text

Abstract:

SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap (Lundberg et al., 2020) is available for decision tree models. However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles. Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs. In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units. Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions. With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19× for SHAP values, and speedups of up to 340× for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.2 GHz CPUs. We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.2 M rows per second—equivalent CPU-based performance is estimated to require 6850 CPU cores.

APA, Harvard, Vancouver, ISO, and other styles

14

Feretzakis, Georgios, Aikaterini Sakagianni, Athanasios Anastasiou, et al. "Integrating Shapley Values into Machine Learning Techniques for Enhanced Predictions of Hospital Admissions." Applied Sciences 14, no. 13 (2024): 5925. http://dx.doi.org/10.3390/app14135925.

Full text

Abstract:

(1) Background: Predictive modeling is becoming increasingly relevant in healthcare, aiding in clinical decision making and improving patient outcomes. However, many of the most potent predictive models, such as deep learning algorithms, are inherently opaque, and their decisions are challenging to interpret. This study addresses this challenge by employing Shapley Additive Explanations (SHAP) to facilitate model interpretability while maintaining prediction accuracy. (2) Methods: We utilized Gradient Boosting Machines (GBMs) to predict patient outcomes in an emergency department setting, with a focus on model transparency to ensure actionable insights. (3) Results: Our analysis identifies “Acuity”, “Hours”, and “Age” as critical predictive features. We provide a detailed exploration of their intricate interactions and effects on the model’s predictions. The SHAP summary plots highlight that “Acuity” has the highest impact on predictions, followed by “Hours” and “Age”. Dependence plots further reveal that higher acuity levels and longer hours are associated with poorer patient outcomes, while age shows a non-linear relationship with outcomes. Additionally, SHAP interaction values uncover that the interaction between “Acuity” and “Hours” significantly influences predictions. (4) Conclusions: We employed force plots for individual-level interpretation, aligning with the current shift toward personalized medicine. This research highlights the potential of combining machine learning’s predictive power with interpretability, providing a promising route concerning a data-driven, evidence-based healthcare future.

APA, Harvard, Vancouver, ISO, and other styles

15

Sultan, Youssef, Mohammad Hammad, and Kelly Lester. "Visualizing Type 2 Diabetes Prevalence: Localizing Model Feature Impacts." International Journal of Data Science 5, no. 2 (2024): 64–74. https://doi.org/10.18517/ijods.5.2.64-74.2024.

Full text

Abstract:

SHAP values have been a common approach used to understand machine learning model predictions by averaging the marginal contributions of each feature across every possible permutation of the feature set. Our research provides a localized view of SHAP values contributing to Type 2 Diabetes (T2D) prevalence in the United States from 2012 - 2021 covering each year independently. Instead of visualizing SHAP feature importance across an entire geographical dataset using a beeswarm plot, our approach is more granular. We visualize individual SHAP values of Social Determinants of Health (SDOH) features by county on a Choropleth map. Additionally, we found that replacing geographic identifiers such as zipcode with precise latitude and longitude coordinates before applying KNN imputation reduced the MSE by 10%. Our visualization reveals how specific factors influence T2D prevalence at the county level using a non-linear machine learning model. By re-appending the initially preserved geographic identifiers for each record by index, we traced the contribution of each SHAP value back to its locality. Our approach opens up a new geographical vantage point of the mechanisms of model predictions, thereby identifying localized key factors influencing Type 2 Diabetes (T2D). This study extends the possibilities for tailored interventions and public health policies showing how some factors have varying predictive impact on an outcome at the geographic level.

APA, Harvard, Vancouver, ISO, and other styles

16

Kariyappa, Sanjay, Leonidas Tsepenekas, Freddy Lécué, and Daniele Magazzeni. "SHAP@k: Efficient and Probably Approximately Correct (PAC) Identification of Top-K Features." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 12 (2024): 13068–75. http://dx.doi.org/10.1609/aaai.v38i12.29205.

Full text

Abstract:

The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP) (and its ordered variant TkIP- O), where the objective is to identify the subset (or ordered subset for TkIP-O) of k features corresponding to the highest SHAP values with PAC guarantees. While any sampling-based method that estimates SHAP values (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. Instead, we leverage the connection between SHAP values and multi-armed bandits (MAB) to show that both TkIP and TkIP-O can be reduced to variants of problems in MAB literature. This reduction allows us to use insights from the MAB literature to develop sample-efficient variants of KernelSHAP and SamplingSHAP. We propose KernelSHAP@k and SamplingSHAP@k for solving TkIP; along with KernelSHAP-O and SamplingSHAP-O to solve the ordering problem in TkIP-O. We perform extensive experiments using several credit-related datasets to show that our methods offer significant improvements of up to 40× in sample efficiency and 39× in runtime.

APA, Harvard, Vancouver, ISO, and other styles

17

Lee, Jae-Min, Tae-In Kim, Chan-Jun Park, et al. "A Study on The Evaluation of SHAP Values for Ecotoxicity in Influent of WWTPs(Wastewater Treatment Plants) and Contribution by Water Quality Pollutants." Journal of the Korean Society for Environmental Technology 25, no. 2 (2024): 100–107. http://dx.doi.org/10.26511/jkset.25.2.3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Ergenç, Cansu, and Rafet Aktaş. "Sector-specific financial forecasting with machine learning algorithm and SHAP interaction values." Financial Internet Quarterly 21, no. 1 (2025): 42–66. https://doi.org/10.2478/fiqf-2025-0004.

Full text

Abstract:

Abstract This study examines the application of machine learning models to predict financial performance in various sectors, using data from 21 companies listed in the BIST100 index (2013-2023). The primary objective is to assess the potential of these models in improving financial forecast accuracy and to emphasize the need for transparent, explainable approaches in finance. A range of machine learning models, including Linear Regression, Ridge, Lasso, Decision Tree, Bagging, Random Forest, AdaBoost, Gradient Boosting (GBM), LightGBM, and XGBoost, were evaluated. Gradient Boosting emerged as the best-performing model, with ensemble methods generally demonstrating superior accuracy and stability compared to linear models. To enhance interpretability, SHAP (SHapley Additive exPlanations) values were utilized, identifying the most influential variables affecting predictions and providing insights into model behavior. Sector-based analyses further revealed differences in model performance and feature impacts, offering a granular understanding of financial dynamics across industries. The findings highlight the effectiveness of machine learning, particularly ensemble methods, in forecasting financial performance. The study underscores the importance of using explainable models in finance to build trust and support decision-making. By integrating advanced techniques with interpretability tools, this research contributes to financial technology, advancing the adoption of machine learning in data-driven investment strategies.

APA, Harvard, Vancouver, ISO, and other styles

19

Raghupathy, Bala Krishnan, Manyam Rajasekhar Reddy, Prasad Theeda, Elangovan Balasubramanian, Rajesh Kumar Namachivayam, and Manikandan Ganesan. "Harnessing Explainable Artificial Intelligence (XAI) based SHAPLEY Values and Ensemble Techniques for Accurate Alzheimer's Disease Diagnosis." Engineering, Technology & Applied Science Research 15, no. 2 (2025): 20743–47. https://doi.org/10.48084/etasr.9619.

Full text

Abstract:

Machine Learning (ML) is a dynamic method for managing extensive datasets to uncover significant patterns and hidden insights. ML has revolutionized numerous industries, from healthcare to finance, and from entertainment to transportation. Ensemble classifiers combined with Explainable AI (XAI) have surfaced as a significant asset in the field of Alzheimer's Disease (AD) diagnosis. Boosting EC techniques coupled with Shapley Additive Explanations (SHAP) offers a powerful approach to AD diagnosis. This paper investigates boosting ensemble ML schemes, such as XGBoost, LightGBM, and Gradient Boosting (GB), for AD diagnosis and SHAP for feature selection. The proposed scheme achieved efficient results, with an accuracy of more than 94% with minimum features for the detection process.

APA, Harvard, Vancouver, ISO, and other styles

20

Pezoa, R., L. Salinas, and C. Torres. "Explainability of High Energy Physics events classification using SHAP." Journal of Physics: Conference Series 2438, no. 1 (2023): 012082. http://dx.doi.org/10.1088/1742-6596/2438/1/012082.

Full text

Abstract:

Abstract Complex machine learning models have been fundamental for achieving accurate results regarding events classification in High Energy Physics (HEP). However, these complex models or black-box systems lack transparency and interpretability. In this work, we use the SHapley Additive exPlanations (SHAP) method for explaining the output of two event machine learning classifiers, based on eXtreme Gradient Boost (XGBoost) and deep neural networks (DNN). We compute SHAP values to interpret the results and analyze the importance of individual features, and the experiments show that SHAP method has high potential for understanding complex machine learning model in the context of high energy physics.

APA, Harvard, Vancouver, ISO, and other styles

21

Arenas, Marcelo, Pablo Barceló, Leopoldo Bertossi, and Mikaël Monet. "The Tractability of SHAP-Score-Based Explanations for Classification over Deterministic and Decomposable Boolean Circuits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (2021): 6670–78. http://dx.doi.org/10.1609/aaai.v35i8.16825.

Full text

Abstract:

Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits, also known as tractable Boolean circuits, generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P-hard).

APA, Harvard, Vancouver, ISO, and other styles

22

Kee, Tris, and Winky K. O. Ho. "eXplainable Machine Learning for Real Estate: XGBoost and Shapley Values in Price Prediction." Civil Engineering Journal 11, no. 5 (2025): 2116–33. https://doi.org/10.28991/cej-2025-011-05-022.

Full text

Abstract:

This study examines the application of eXplainable Artificial Intelligence (XAI) in property market research, utilizing housing transaction data from Quarry Bay, Hong Kong. The research employs the XGBoost algorithm to predict property prices and subsequently computes Shapley Additive Explanations (SHAP) values to quantify feature importance. A beeswarm plot is used to visualize the distribution of SHAP values, uncovering complex relationships between prices and property characteristics. The findings demonstrate how features such as square footage and property age contribute to average price predictions, offering valuable insights for urban planning and real estate decision-making. In contrast to the traditional black-box models, this study integrates XAI methodologies to enhance model interpretability, thereby fostering trust in AI-driven market analyses. The novelty of this research lies in its combination of machine learning and explainable techniques, bridging the gap between predictive accuracy and interpretability in property valuation. By advancing data-driven decision-making, this study underscores the potential of XAI in promoting transparency and facilitating informed policymaking in the property market. Doi: 10.28991/CEJ-2025-011-05-022 Full Text: PDF

APA, Harvard, Vancouver, ISO, and other styles

23

Alomari, Yazan, Marcia Baptista, and Mátyás Andó. "Integrating Network Theory and SHAP Analysis for Enhanced RUL Prediction in Aeronautics." PHM Society European Conference 8, no. 1 (2024): 15. http://dx.doi.org/10.36001/phme.2024.v8i1.4077.

Full text

Abstract:

The prediction of Remaining Useful Life (RUL) in aerospace engines is a challenge due to the complexity of these systems and the often-opaque nature of machine learning models. This opaqueness complicates the usability of predictions in scenarios where transparency is crucial for safety and operational decision-making. Our research introduces the machine learning framework that significantly improves both the interpretability and accuracy of RUL predictions. This framework incorporates SHapley Additive exPlanations (SHAP) with a surrogate model and Network Theory to clarify the decision-making processes in complex predictive models and enhance the understanding of the hidden pattern of features interaction. We developed a Feature Interaction Network (FIN) that uses SHAP values for node sizing and SHAP interaction values for edge weighting, offering detailed insights into the interdependencies among features that affect RUL predictions. Our approach was tested across 44 engines, showing RMSE values between 2 and 17 and NASA Scores from 0.2 to 1.5, indicating an increase in prediction accuracy. Furthermore, regarding interpretability the application of our FIN, revealed significant interactions among corrective speed and critical temperature points key factors in engine efficiency and performance.

APA, Harvard, Vancouver, ISO, and other styles

24

李, 珍玲. "Analysis of the Influencing Factors of Liquor Consumer Stickiness Based on MLP and SHAP Values." Operations Research and Fuzziology 13, no. 05 (2023): 5283–99. http://dx.doi.org/10.12677/orf.2023.135530.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Antonini, Antonella S., Juan Tanzola, Lucía Asiain, et al. "Machine Learning model interpretability using SHAP values: Application to Igneous Rock Classification task." Applied Computing and Geosciences 23 (September 2024): 100178. http://dx.doi.org/10.1016/j.acags.2024.100178.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Choi, Ho-Woong, and Sardor Abdirayimov. "Demonstrating the Power of SHAP Values in AI-Driven Classification of Marvel Characters." Journal of Multimedia Information System 11, no. 2 (2024): 167–72. http://dx.doi.org/10.33851/jmis.2024.11.2.167.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Gupta, Pooja, Srabanti Maji, and Ritika Mehra. "Compound Facial Emotion Recognition based on Facial Action Coding System and SHAP Values." International Research Journal on Advanced Science Hub 5, Issue 05S (2023): 26–34. http://dx.doi.org/10.47392/irjash.2023.s004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

SUBIANTO, MUHAMMAD, INA YATUL ULYA, EVI RAMADHANI, BAGUS SARTONO, and ALFIAN FUTUHUL HADI. "Application of SHAP on CatBoost classification for identification of variabels characterizing food insecurity occurrences in Aceh Province households." Jurnal Natural 23, no. 3 (2023): 230–44. http://dx.doi.org/10.24815/jn.v23i3.33548.

Full text

Abstract:

Classification is the process of building a model that can distinguish between different classes of data. The model aims to predict the class of testing data based on patterns or relationships learned from training data. One of the data processing algorithms used to build classification models is Categorical Boosting (CatBoost). However, in general, the resulting models are difficult to interpret. To facilitate the interpretation of complex classification models, methods such as SHAP (SHapley Additive exPlanations) are needed. SHAP is a method to explain individual predictions. SHAP is based on the game theoretically optimal shapley values. In this study, an analysis of important SHAP variables was conducted on the CatBoost classification model to identify variables characterizing occurrences of food insecurity in households. The data used in this study was obtained from the Survei Sosial Ekonomi Nasional (Susenas) in March 2021 in Aceh Province, sourced from the Badan Pusat Statistik (BPS). There are 13,126 observations in the research data. The results from four evaluated classification models on the testing data showed that the best model had accuracy, sensitivity, specificity, and AUC values of 0.703, 0.349, 0.798, and 0.637, respectively. Furthermore, the results of the analysis of important SHAP variables showed that the variables number of household members who smoke ( ), education of the household head ( ), wall types ( ), drinking water source ( ), and decent sanitation ( ) significantly contributed to the occurrences of food insecurity in households in Aceh Province in the year 2021.

APA, Harvard, Vancouver, ISO, and other styles

29

Vahed, Sepideh Zununi, Seyed Mahdi Hosseiniyan Khatibi, Yalda Rahbar Saadat, et al. "Introducing effective genes in lymph node metastasis of breast cancer patients using SHAP values based on the mRNA expression data." PLOS ONE 19, no. 8 (2024): e0308531. http://dx.doi.org/10.1371/journal.pone.0308531.

Full text

Abstract:

Objective Breast cancer, a global concern predominantly impacting women, poses a significant threat when not identified early. While survival rates for breast cancer patients are typically favorable, the emergence of regional metastases markedly diminishes survival prospects. Detecting metastases and comprehending their molecular underpinnings are crucial for tailoring effective treatments and improving patient survival outcomes. Methods Various artificial intelligence methods and techniques were employed in this study to achieve accurate outcomes. Initially, the data was organized and underwent hold-out cross-validation, data cleaning, and normalization. Subsequently, feature selection was conducted using ANOVA and binary Particle Swarm Optimization (PSO). During the analysis phase, the discriminative power of the selected features was evaluated using machine learning classification algorithms. Finally, the selected features were considered, and the SHAP algorithm was utilized to identify the most significant features for enhancing the decoding of dominant molecular mechanisms in lymph node metastases. Results In this study, five main steps were followed for the analysis of mRNA expression data: reading, preprocessing, feature selection, classification, and SHAP algorithm. The RF classifier utilized the candidate mRNAs to differentiate between negative and positive categories with an accuracy of 61% and an AUC of 0.6. During the SHAP process, intriguing relationships between the selected mRNAs and positive/negative lymph node status were discovered. The results indicate that GDF5, BAHCC1, LCN2, FGF14-AS2, and IDH2 are among the top five most impactful mRNAs based on their SHAP values. Conclusion The prominent identified mRNAs including GDF5, BAHCC1, LCN2, FGF14-AS2, and IDH2, are implicated in lymph node metastasis. This study holds promise in elucidating a thorough insight into key candidate genes that could significantly impact the early detection and tailored therapeutic strategies for lymph node metastasis in patients with breast cancer.

APA, Harvard, Vancouver, ISO, and other styles

30

Ikushima, Hiroaki, Kousuke Watanabe, Aya Shinozaki-Ushiku, Katsutoshi Oda, and Hidenori Kage. "A retrospective machine learning–based analysis of nationwide cancer comprehensive genomic profiling data to identify features associated with recommendation of mutation-based therapy." Journal of Clinical Oncology 42, no. 16_suppl (2024): e13510-e13510. http://dx.doi.org/10.1200/jco.2024.42.16_suppl.e13510.

Full text

Abstract:

e13510 Background: Comprehensive genomic profiling (CGP) has played key roles in cancer precision medicine through optimization of therapeutic interventions based on genomic alterations in cancer cells. However, the probability of discovering mutation-based treatments through CGP remains low. To enhance the effectiveness and efficiency of cancer precision medicine, it is crucial to identify patients who are likely to benefit from CGP tests. This study aims to identify characteristics of patients in which mutation-based treatments are discovered by CGP tests. Methods: We retrospectively analyzed data from 60,655 patients who underwent CGP tests and were registered in the Center for Cancer Genomics and Advanced Therapeutics (C-CAT), the national data center for cancer CGP in Japan. The C-CAT database covers 99.7% of cancer patients who have undergone CGP tests in Japan. Major cancer types included 10,182 cases of bowel cancer, 8,691 pancreas cancer, 5,062 biliary tract cancer, and 3,777 breast cancer. We developed an eXtreme Gradient Boosting (XGBoost) model using machine learning, and used clinical information as input to predict whether one or more Japanese Pharmaceuticals and Medical Devices Agency (PMDA)-approved drugs are discovered through CGP tests or not. Shapley Additive Explanations (SHAP) was employed to extract significant features that contribute to the model prediction. Results: The prediction model achieved an area under the receiver operating characteristic curve of 0.826 for the overall cancer population. Positive SHAP values were observed for patients with breast (mean SHAP in breast cancer patients: 1.38), lung (1.08), bowel (0.75), and pancreas cancers (0.35), while negative SHAP values were associated with head and neck (-1.75), cervical cancers (-1.54), and brain (-1.33). Positive SHAP values were also associated with presence of liver or lymph node metastasis (0.23, 0.08), shorter intervals between diagnosis and CGP testing or specimen collection and CGP testing, and advanced age. Similar trends were observed in cancer-type-specific prediction models, which also identified their own unique features. In the adolescent and young adult (AYA) age group, primary brain tumors were strongly associated with negative SHAP values (-1.37). Conclusions: Our machine learning-based analysis of nationwide CGP data identified features that predict cases in which mutation-based treatments are discovered by CGP tests, both in the overall cancer population and within specific cancer types and the AYA age group. Expedited CGP testing is recommended for patients who match the identified profile to facilitate early targeted therapy interventions.

APA, Harvard, Vancouver, ISO, and other styles

31

Chen, Jun-Wei, Hsin-An Chen, Tzu-Chi Liu, Tzu-En Wu, and Chi-Jie Lu. "The Potential of SHAP and Machine Learning for Personalized Explanations of Influencing Factors in Myopic Treatment for Children." Medicina 61, no. 1 (2024): 16. https://doi.org/10.3390/medicina61010016.

Full text

Abstract:

Background and Objectives: The rising prevalence of myopia is a significant global health concern. Atropine eye drops are commonly used to slow myopia progression in children, but their long-term use raises concern about intraocular pressure (IOP). This study uses SHapley Additive exPlanations (SHAP) to improve the interpretability of machine learning (ML) model predicting end IOP, offering clinicians explainable insights for personalized patient management. Materials and Methods: This retrospective study analyzed data from 1191 individual eyes of 639 boys and 552 girls with myopia treated with atropine. The average age of the whole group was 10.6 ± 2.5 years old. The refractive error of spherical equivalent (SE) in myopia degree was base SE at 2.63D and end SE at 3.12D. Data were collected from clinical records, including demographic information, IOP measurements, and atropine treatment details. The patients were divided into two subgroups based on a baseline IOP of 14 mmHg. ML models, including Lasso, CART, XGB, and RF, were developed to predict the end IOP value. Then, the best-performing model was further interpreted using SHAP values. The SHAP module created a personalized and dynamic graphic to illustrate how various factors (e.g., age, sex, cumulative duration, and dosage of atropine treatment) affect the end IOP. Results: RF showed the best performance, with superior error metrics in both subgroups. The interpretation of RF with SHAP revealed that age and the recruitment duration of atropine consistently influenced IOP across subgroups, while other variables had varying effects. SHAP values also offer insights, helping clinicians understand how different factors contribute to predicted IOP value in individual children. Conclusions: SHAP provides an alternative approach to understand the factors affecting IOP in children with myopia treated with atropine. Its enhanced interpretability helps clinicians make informed decisions, improving the safety and efficacy of myopia management. This study demonstrates the potential of combining SHAP with ML models for personalized care in ophthalmology.

APA, Harvard, Vancouver, ISO, and other styles

32

Padarian, José, Alex B. McBratney, and Budiman Minasny. "Game theory interpretation of digital soil mapping convolutional neural networks." SOIL 6, no. 2 (2020): 389–97. http://dx.doi.org/10.5194/soil-6-389-2020.

Full text

Abstract:

Abstract. The use of complex models such as deep neural networks has yielded large improvements in predictive tasks in many fields including digital soil mapping. One of the concerns about using these models is that they are perceived as black boxes with low interpretability. In this paper we introduce the use of game theory, specifically Shapley additive explanations (SHAP) values, in order to interpret a digital soil mapping model. SHAP values represent the contribution of a covariate to the final model predictions. We applied this method to a multi-task convolutional neural network trained to predict soil organic carbon in Chile. The results show the contribution of each covariate to the model predictions in three different contexts: (a) at a local level, showing the contribution of the various covariates for a single prediction; (b) a global understanding of the covariate contribution; and (c) a spatial interpretation of their contributions. The latter constitutes a novel application of SHAP values and also the first detailed analysis of a model in a spatial context. The analysis of a SOC (soil organic carbon) model in Chile corroborated that the model is capturing sensible relationships between SOC and rainfall, temperature, elevation, slope, and topographic wetness index. The results agree with commonly reported relationships, highlighting environmental thresholds that coincide with significant areas within the study area. This contribution addresses the limitations of the current interpretation of models in digital soil mapping, especially in a spatial context. We believe that SHAP values are a valuable tool that should be included within the DSM (digital soil mapping) framework, since they address the important concerns regarding the interpretability of more complex models. The model interpretation is a crucial step that could lead to generating new knowledge to improve our understanding of soils.

APA, Harvard, Vancouver, ISO, and other styles

33

Li, Xuan, Chaofan Wu, Michael E. Meadows, et al. "Factors Underlying Spatiotemporal Variations in Atmospheric PM2.5 Concentrations in Zhejiang Province, China." Remote Sensing 13, no. 15 (2021): 3011. http://dx.doi.org/10.3390/rs13153011.

Full text

Abstract:

Fine particulate matter in the lower atmosphere (PM2.5) continues to be a major public health problem globally. Identifying the key contributors to PM2.5 pollution is important in monitoring and managing atmospheric quality, for example, in controlling haze. Previous research has been aimed at quantifying the relationship between PM2.5 values and their underlying factors, but the spatial and temporal dynamics of these factors are not well understood. Based on random forest and Shapley additive explanation (SHAP) algorithms, this study analyses the spatiotemporal variations in selected key factors influencing PM2.5 in Zhejiang Province, China, for the period 2000–2019. The results indicate that, while factors influencing PM2.5 varied significantly during the period studied, SHAP values suggest that there is consistency in their relative importance as follows: meteorological factors (e.g., atmospheric pressure) > socioeconomic factors (e.g., gross domestic product, GDP) > topography and land cover factors (e.g., elevation). The contribution of GDP and transportation factors initially increased but has declined in the recent past, indicating that economic and infrastructural development does not necessarily result in increased PM2.5 concentrations. Vegetation productivity, as indicated by changes in NDVI, is demonstrated to have become more important in improving air quality, and the area of the province over which it constrains PM2.5 concentrations has increased between 2000 and 2019. Mapping of SHAP values suggests that, although the relative importance of industrial emissions has declined during the period studied, the actual area positively impacted by such emissions has actually increased. Despite developments in government policy, greater efforts to conserve energy and reduce emissions are still needed. The study further demonstrates that the combination of random forest and SHAP methods provides a valuable means to identify regional differences in key factors affecting atmospheric PM2.5 values and offers a reliable reference for pollution control strategies.

APA, Harvard, Vancouver, ISO, and other styles

34

Li, Richard, Ashwin Shinde, An Liu, et al. "Machine Learning–Based Interpretation and Visualization of Nonlinear Interactions in Prostate Cancer Survival." JCO Clinical Cancer Informatics, no. 4 (September 2020): 637–46. http://dx.doi.org/10.1200/cci.20.00002.

Full text

Abstract:

PURPOSE Shapley additive explanation (SHAP) values represent a unified approach to interpreting predictions made by complex machine learning (ML) models, with superior consistency and accuracy compared with prior methods. We describe a novel application of SHAP values to the prediction of mortality risk in prostate cancer. METHODS Patients with nonmetastatic, node-negative prostate cancer, diagnosed between 2004 and 2015, were identified using the National Cancer Database. Model features were specified a priori: age, prostate-specific antigen (PSA), Gleason score, percent positive cores (PPC), comorbidity score, and clinical T stage. We trained a gradient-boosted tree model and applied SHAP values to model predictions. Open-source libraries in Python 3.7 were used for all analyses. RESULTS We identified 372,808 patients meeting the inclusion criteria. When analyzing the interaction between PSA and Gleason score, we demonstrated consistency with the literature using the example of low-PSA, high-Gleason prostate cancer, recently identified as a unique entity with a poor prognosis. When analyzing the PPC-Gleason score interaction, we identified a novel finding of stronger interaction effects in patients with Gleason ≥ 8 disease compared with Gleason 6-7 disease, particularly with PPC ≥ 50%. Subsequent confirmatory linear analyses supported this finding: 5-year overall survival in Gleason ≥ 8 patients was 87.7% with PPC < 50% versus 77.2% with PPC ≥ 50% ( P < .001), compared with 89.1% versus 86.0% in Gleason 7 patients ( P < .001), with a significant interaction term between PPC ≥ 50% and Gleason ≥ 8 ( P < .001). CONCLUSION We describe a novel application of SHAP values for modeling and visualizing nonlinear interaction effects in prostate cancer. This ML-based approach is a promising technique with the potential to meaningfully improve risk stratification and staging systems.

APA, Harvard, Vancouver, ISO, and other styles

35

Kim, Donghyun, Gian Antariksa, Melia Putri Handayani, Sangbong Lee, and Jihwan Lee. "Explainable Anomaly Detection Framework for Maritime Main Engine Sensor Data." Sensors 21, no. 15 (2021): 5200. http://dx.doi.org/10.3390/s21155200.

Full text

Abstract:

In this study, we proposed a data-driven approach to the condition monitoring of the marine engine. Although several unsupervised methods in the maritime industry have existed, the common limitation was the interpretation of the anomaly; they do not explain why the model classifies specific data instances as an anomaly. This study combines explainable AI techniques with anomaly detection algorithm to overcome the limitation above. As an explainable AI method, this study adopts Shapley Additive exPlanations (SHAP), which is theoretically solid and compatible with any kind of machine learning algorithm. SHAP enables us to measure the marginal contribution of each sensor variable to an anomaly. Thus, one can easily specify which sensor is responsible for the specific anomaly. To illustrate our framework, the actual sensor stream obtained from the cargo vessel collected over 10 months was analyzed. In this analysis, we performed hierarchical clustering analysis with transformed SHAP values to interpret and group common anomaly patterns. We showed that anomaly interpretation and segmentation using SHAP value provides more useful interpretation compared to the case without using SHAP value.

APA, Harvard, Vancouver, ISO, and other styles

36

Assegie, Tsehay Admassu. "Evaluation of the Shapley Additive Explanation Technique for Ensemble Learning Methods." Proceedings of Engineering and Technology Innovation 21 (April 22, 2022): 20–26. http://dx.doi.org/10.46604/peti.2022.9025.

Full text

Abstract:

This study aims to explore the effectiveness of the Shapley additive explanation (SHAP) technique in developing a transparent, interpretable, and explainable ensemble method for heart disease diagnosis using random forest algorithms. Firstly, the features with high impact on the heart disease prediction are selected by SHAP using 1025 heart disease datasets, obtained from a publicly available Kaggle data repository. After that, the features which have the greatest influence on the heart disease prediction are used to develop an interpretable ensemble learning model to automate the heart disease diagnosis by employing the SHAP technique. Finally, the performance of the developed model is evaluated. The SHAP values are used to obtain better performance of heart disease diagnosis. The experimental result shows that 100% prediction accuracy is achieved with the developed model. In addition, the experiment shows that age, chest pain, and maximum heart rate have positive impact on the prediction outcome.

APA, Harvard, Vancouver, ISO, and other styles

37

Queiró Silva, R., D. Seoane-Mato, A. Laiz, et al. "POS1074 MINIMAL DISEASE ACTIVITY (MDA) IN PATIENTS WITH RECENT-ONSET PSORIATIC ARTHRITIS. PREDICTIVE MODEL BASED ON MACHINE LEARNING." Annals of the Rheumatic Diseases 81, Suppl 1 (2022): 861–62. http://dx.doi.org/10.1136/annrheumdis-2022-eular.1841.

Full text

Abstract:

BackgroundVery few data are available on predictors of minimal disease activity (MDA) in patients with recent-onset psoriatic arthritis (PsA). Such data are crucial, since the therapeutic measures used to change the adverse course of PsA are more likely to succeed if we intervene early.ObjectivesTo detect patient and disease variables associated with achieving MDA in patients with recent-onset PsA.MethodsWe performed a multicenter observational prospective study (2-year follow-up, regular annual visits), promoted by the Spanish Society of Rheumatology. Patients aged ≥18 years who fulfilled the CASPAR criteria, with less than 2 years since the onset of symptoms, were included. The intention at the baseline visit was to reflect the patient’s situation before disease progress was modified by the treatments prescribed by the rheumatologist.All patients gave their informed consent. The study was approved by the Clinical Research Ethics Committee of the Principality of Asturias.MDA was defined as fulfillment of at least 5 of the following: ≤1 tender joint; ≤1 swollen joint; PASI ≤1 or BSA ≤3%; score on the visual analog scale (VAS) for pain provided by the patient ≤1.5; overall score for disease activity provided by the patient ≤2; HAQ score ≤0.5; ≤1 painful enthesis [1].The dataset contained data for the independent variables from the baseline visit and from follow-up visit number 1. These were matched with the outcome measures from follow-up visits 1 and 2, respectively. We trained a random forest–type machine learning algorithm to analyze the association between the outcome measure and the variables selected in the bivariate analysis. In order to understand how the model uses the variables to make its predictions, we applied the SHAP technique. This approach assigns a SHAP value to each value of each variable according to the extent to which it affects the prediction of the model (the higher the absolute SHAP value, the greater the influence of this data item on prediction) and to how it affects the prediction (if the SHAP value is positive, the data item positively affects the prediction, that is, it confers a higher value on the prediction). The SHAP summary graphs order the predictors by their importance in the predictions of the model. This importance is calculated with the mean of the SHAP values assigned to each data item of a variable; mean values <0.01 indicate the low importance of the variable in the model. We used a confusion matrix to visualize the performance of the model. This matrix shows the real class of the data items, together with the predicted class, and records the number of hits and misses.ResultsThe sample comprised 158 patients. 14.6% were lost to follow-up. 55.5% and 58.3% of the patients had MDA at the first and second follow-up visit, respectively. The importance of the variables in the model according to the mean of the SHAP values is shown in Table 1. The variables with the greatest predictive ability were global pain, impact of the disease (PsAID), patient global assessment of disease and physical function (HAQ-Disability Index). The SHAP values for each value of each variable are shown in Figure 1. The percentage of hits in the confusion matrix was 85.94%.Table 1.Variables in the predictions of the random forest for MDA according to the SHAP method.VariableImportance according to SHAP1Global pain0.069PsAID0.064Patient global assessment of disease0.047HAQ0.044Articular pattern at diagnosis0.029Physician global assessment of disease0.023Tender joint count0.014Sex0.009Weekly alcohol consumption0.0091Mean of the SHAP values for each value of the variable.MDA: minimal disease activity.Figure 1.SHAP summary graph.ConclusionA key objective in the management of PsA should be control of pain, which is not always associated with inflammatory burden, and the establishment of measures to better control the various domains of PsA.References[1]Coates LC, Fransen J, Helliwell PS. Defining minimal disease activity in psoriatic arthritis: a proposed objective target for treatment. Ann Rheum Dis. 2010;69:48-53.AcknowledgementsThe authors would like to acknowledge José Luis Fernández Sueiro for the conception of the study; José Miguel Carrasco for his contribution to the design of the study; Nuria Montero and Cristina Oliva for her contribution to data monitoring; Ana González Marcos and Cristina Pruenza for her contribution to data analysis; and Thomas O´Boyle for the translation of the manuscript.Disclosure of InterestsNone declared

APA, Harvard, Vancouver, ISO, and other styles

38

He, Bo, Ping Ye, Marta Taghavi, et al. "A machine learning model to predict treatment initiation among new patients in a community oncology network." Journal of Clinical Oncology 41, no. 16_suppl (2023): e13539-e13539. http://dx.doi.org/10.1200/jco.2023.41.16_suppl.e13539.

Full text

Abstract:

e13539 Background: In The US Oncology Network (The Network), about one-third of new patients with a cancer diagnosis started intravenous (IV) treatment after their first visit. The rest of the patients either came in for a consult only or might have received other treatments such as radiation, surgery, or oral therapy. We developed a machine learning model to predict IV treatment initiation among new patients and discovered features associated with the patient’s decision. This model could suggest interventions to improve patient’s access to care. Methods: A retrospective cohort was formed by identifying new patients with cancer from 27 practices in The Network between July 1, 2021 and June 30, 2022. Structured data were extracted and processed from the electronic health records, claims, physician referrals, and the American Community Survey. Patient characteristics included demographics, clinical information, payor types, and socioeconomic status. The referral pattern and the geographic region of practices, and the provider workload were considered as well. Gradient-boosted decision trees, random forest, neural network, and logistic regression models were developed to predict the probability of starting IV treatment within 90 days of the first visit. Model performance was evaluated based on the area under the receiver operating characteristic (AUROC) curve using cross-valuation and 4:1 training/validation random split. Shapley Additive Explanations (SHAP) values were applied to the model to explain feature importance. Results: A total of 117,340 new patients with a cancer diagnosis were included in the study, of whom 35% initiated IV treatment within 90 days of the first visit. A gradient-boosted decision tree algorithm with control of the imbalanced label was chosen as the final model because of the performance and the ability to handle missing values. The model achieved an AUROC of 0.80 on the validation dataset with both cross-valuation and 4:1 training/validation random split. Based on the SHAP values (log odds), we found that clinical information including diagnosis and stage is the most important feature to predict the initiation of IV treatment (mean absolute SHAP = 0.31 and 1.03, respectively). Medicaid contributes least to treatment initiation among all insurance types (mean absolute SHAP = 0.01). In addition, younger age and male patients have a higher chance to start IV treatment (Pearson correlation = -0.41, p-value < 0.01 for age versus SHAP values; p-value < 0.01, two-sided T-test for SHAP values by gender). Conclusions: This study reports a machine learning model to predict IV treatment initiation among new patients with cancer. Clinical features impact the treatment decision more than others. This model could guide patient service and direct personalized care navigation. Further, the model sheds light on future interventions that could enhance patient access to treatment promptly.

APA, Harvard, Vancouver, ISO, and other styles

39

Abd Karim, Shahiratul Amalina, Ummul Hanan Mohamad, and Puteri Nor Ellyza Nohuddin. "Discovery of Interpretable Patterns of Breast Cancer Diagnosis via Class Association Rule Mining (CARM) With SHAP-Based Explainable AI (XAI)." Malaysian Journal of Fundamental and Applied Sciences 21, no. 3 (2025): 2008–31. https://doi.org/10.11113/mjfas.v21n3.3792.

Full text

Abstract:

Breast cancer remains the most common cancer among women globally highlighting the importance of early and reliable diagnostic methods. While previous studies have applied association rules mining (ARM) to explore factors contributing to breast cancer, many lacked robust validation of the extracted rules. To address this gap and deepen our understanding of the key biological markers linked to the disease, this study proposes a hybrid framework that integrates Class Association Rule Mining (CARM) with SHapley Additive exPlanations (SHAP) values based on Random Forest (RF) and Gradient Boost (GB) models to uncover and validate meaningful diagnostic patterns. Using the Breast Cancer Coimbra (BCC) dataset comprising 116 patient samples and nine biological markers, a total of 723,938 association rules (AR) were generated with 17,720 significant class association rules (CAR) were extracted. These rules were pruned using lift, leverage and conviction to retain the most relevant ones. Among the healthy group, combinations involving low glucose, low insulin, low resistin and low Homeostatic Model Assessment (HOMA) were consistently observed, while high BMI appeared particularly among younger individuals. These features were associated with negative SHAP values validating their contribution to healthy classifications. In contrast, common patterns such as high glucose, medium resistin and medium Monocyte Chemoattractant Protein-1 (MCP.1) among middle aged individuals highlighting their influence in predicting patient classification. These features consistently showed strong positive SHAP values across both classifiers highlighting their influence in predicting patient outcomes. By combining rule extraction of CARM with feature contribution using SHAP, this study provides a validated and interpretable approach for breast cancer diagnosis. The findings highlight the importance of feature interactions and offer promising directions for personalized risk assessment and early detection.

APA, Harvard, Vancouver, ISO, and other styles

40

Azadi, Mohammad, and Mahmood Matin. "Interpretation of fatigue lifetime prediction by machine learning modeling in piston aluminum alloys under different manufacturing and loading conditions." Frattura ed Integrità Strutturale 18, no. 68 (2024): 357–70. http://dx.doi.org/10.3221/igf-esis.68.24.

Full text

Abstract:

Various input variables, including corrosion time, fretting force, stress, lubrication, heat-treating, and nano-particles, were evaluated by modeling of stress-controlled fatigue lifetimes in AlSi12CuNiMg aluminum alloy of the engine pistons with different machine learning (ML) techniques. Bending fatigue experiments were conducted through cyclic loading with zero mean stress, and then experimental data was predicted by five different ML-based models. Moreover, when the optimal ML prediction model was found, it was analyzed using the Shapley additive explanation (SHAP) values method. Results illustrated that extreme gradient boosting (XGBoost) had superior data for estimations, with average training coefficients of determination of at least 63% and 90%, respectively for fatigue lifetime and its logarithmic value. The SHAP values interpretation of the XGBoost model revealed that fretting force, stress, and corrosion time were the most significant inputs in estimating the logarithm values of fatigue lifetimes, respectively.

APA, Harvard, Vancouver, ISO, and other styles

41

Liu, Wei, Zhangxin Chen, Yuan Hu, and Jun Zhang. "Forecasting pipeline safety and remaining life with machine learning methods and SHAP interaction values." International Journal of Pressure Vessels and Piping 205 (October 2023): 105000. http://dx.doi.org/10.1016/j.ijpvp.2023.105000.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Liu, Peili, Song Han, and Na Rong. "Frequency stability prediction of renewable energy penetrated power systems using CoAtNet and SHAP values." Engineering Applications of Artificial Intelligence 123 (August 2023): 106403. http://dx.doi.org/10.1016/j.engappai.2023.106403.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Jose, Blessy Jayaron, Preeti Jain, and T. Raja Rani. "A Data-driven Approach to Understanding Energy Losses using COMSOL Simulation and SHAP Values." WSEAS TRANSACTIONS ON ENVIRONMENT AND DEVELOPMENT 21 (May 26, 2025): 552–73. https://doi.org/10.37394/232015.2025.21.46.

Full text

Abstract:

This study investigates energy losses in crude oil pipelines to optimize design, improve efficiency, and enhance safety. Pipelines made of AISI1020 steel were modeled as three equal-length sections with varying diameters to replicate real-world conditions. COMSOL Multiphysics simulations were conducted to analyze pipeline behavior under different heat and flow scenarios. Temperature-related challenges were a primary focus due to their impact on energy dissipation. A quantile loss prediction approach identified the best-performing models. Based on machine learning model metrics and quantile loss, the best prediction models were analyzed for each output. For instance, for the average Head Loss (HL_Avg), the Random forest-tuned model emerged as the best and most balanced model, excelling across all metrics and quantiles while offering high accuracy and minimizing overfitting risks. Further, the analysis of SHAP values to assess the influence of key parameters such as fluid velocity, temperature gradients, and pipeline geometry is a novel approach that enhances the interpretability of model predictions. The findings emphasize the significance of model selection in energy loss prediction, demonstrating how effective forecasting enhances pipeline efficiency, reduces costs, and supports environmental sustainability.

APA, Harvard, Vancouver, ISO, and other styles

44

Hanani, Ahmad A., Turker Berk Donmez, Mustafa Kutlu, and Mohammed Mansour. "Predicting thyroid cancer recurrence using supervised CatBoost: A SHAP-based explainable AI approach." Medicine 104, no. 22 (2025): e42667. https://doi.org/10.1097/md.0000000000042667.

Full text

Abstract:

Recurrence prediction in well-differentiated thyroid cancer remains a clinical challenge, necessitating more accurate and interpretable predictive models. This study investigates the use of a supervised CatBoost classifier to predict recurrence in well-differentiated thyroid cancer patients, comparing its performance against other ensemble models and employing Shapley Additive Explanations (SHAP) to enhance interpretability. A dataset comprising 383 patients with diverse demographic, clinical, and pathological variables was utilized. Data preprocessing steps included handling values and encoding categorical features. The dataset was split into training and testing sets using a 70:30 ratio. Model performance was evaluated using accuracy and area under the receiver operating characteristic curve. A comparative analysis was conducted with other ensemble methods, such as Extra Trees, LightGBM, and XGBoost. SHAP analysis was employed to determine feature importance and assess model interpretability at both the global and local levels. The supervised CatBoost classifier demonstrated superior performance, achieving an accuracy of 97% and an area under the receiver operating characteristic curve of 0.99, outperforming competing models. SHAP analysis revealed that treatment response (SHAP value: 2.077), risk stratification (SHAP value: 0.859), and lymph node involvement (N) (SHAP value: 0.596) were the most influential predictors of recurrence. Local SHAP analyses provided insight into individual predictions, highlighting that misclassification often resulted from overemphasizing a single factor while overlooking other clinically relevant indicators. The supervised CatBoost classifier demonstrated high predictive performance and enhanced interpretability through SHAP analysis. These findings underscore the importance of incorporating multiple predictive factors to improve recurrence risk assessment. While the model shows promise in personalizing thyroid cancer management, further validation on larger, more diverse datasets is warranted to ensure robustness.

APA, Harvard, Vancouver, ISO, and other styles

45

Liu, Shuxian, Yang Liu, Zhigang Chu, et al. "Evaluation of Tropical Cyclone Disaster Loss Using Machine Learning Algorithms with an eXplainable Artificial Intelligence Approach." Sustainability 15, no. 16 (2023): 12261. http://dx.doi.org/10.3390/su151612261.

Full text

Abstract:

In the context of global warming, tropical cyclones (TCs) have garnered significant attention as one of the most severe natural disasters in China, particularly in terms of assessing the disaster losses. This study aims to evaluate the TC disaster loss (TCDL) using machine learning (ML) algorithms and identify the impact of specific feature factors on the prediction of model with an eXplainable Artificial Intelligence (XAI) approach, SHapley Additive exPlanations (SHAP). The results show that LightGBM outperforms Random Forest (RF), Support Vector Machine (SVM), and Naive Bayes (NB) for estimating the TCDL grades, achieving the highest accuracy value of 0.86. According to the SHAP values, the three most important factors in the LightGBM classifier model are proportion of stations with rainfall exceeding 50 mm (ProRain), maximum wind speed (MaxWind), and maximum daily rainfall (MaxRain). Specifically, in the estimation of high TCDL grade, events characterized with MaxWind exceeding 30 m/s, MaxRain exceeding 200 mm, and ProRain exceeding 30% tend to exhibit a higher susceptibility to TC disaster due to positive SHAP values. This study offers a valuable tool for decision-makers to develop scientific strategies in the risk management of TC disaster.

APA, Harvard, Vancouver, ISO, and other styles

46

Sarder Abdulla Al Shiam, Md Mahdi Hasan, Md Jubair Pantho, et al. "Credit Risk Prediction Using Explainable AI." Journal of Business and Management Studies 6, no. 2 (2024): 61–66. http://dx.doi.org/10.32996/jbms.2024.6.2.6.

Full text

Abstract:

Despite advancements in machine-learning prediction techniques, the majority of lenders continue to rely on conventional methods for predicting credit defaults, largely due to their lack of transparency and explainability. This reluctance to embrace newer approaches persists as there is a compelling need for credit default prediction models to be explainable. This study introduces credit default prediction models employing several tree-based ensemble methods, with the most effective model, XGBoost, being further utilized to enhance explainability. We implement SHapley Additive exPlanations (SHAP) in ML-based credit scoring models using data from the US-based P2P Lending Platform, Lending Club. Detailed discussions on the results, along with explanations using SHAP values, are also provided. The model explainability generated by Shapely values enables its applicability to a broad spectrum of industry applications.

APA, Harvard, Vancouver, ISO, and other styles

47

Cynthia, C., Debayani Ghosh, and Gopal Krishna Kamath. "Detection of DDoS Attacks Using SHAP-Based Feature Reduction." International Journal of Machine Learning 13, no. 4 (2023): 173–80. http://dx.doi.org/10.18178/ijml.2023.13.4.1147.

Full text

Abstract:

Machine learning techniques are widely used to protect cyberspace against malicious attacks. In this paper, we propose a machine learning-based intrusion detection system to alleviate Distributed Denial-of-Service (DDoS) attacks, which is one of the most prevalent attacks that disrupt the normal traffic of the targeted network. The model prediction is interpreted using the SHapley Additive exPlanations (SHAP) technique, which also provides the most essential features with the highest Shapley values. For the proposed model, the CICIDS2017 dataset from Kaggle is used for training the classification algorithms. The top features selected by the SHAP technique are used for training a Conditional Tabular Generative Adversarial Networks (CTGAN) for synthetic data generation. The CTGAN-generated data are then used to train prediction models such as Support Vector Classifier (SVC), Random Forest (RF), and Naïve Bayes (NB). The performance of the model is characterized using a confusion matrix. The experiment results prove that the attack detection rate is significantly improved after applying the SHAP feature selection technique.

APA, Harvard, Vancouver, ISO, and other styles

48

Cao, Mengru, and Chunhui Li. "Prediction of In-Hospital Mortality in Non-ST-Segment Elevation Myocardial Infarction, Based on Interpretable Machine Learning." Applied Sciences 15, no. 8 (2025): 4226. https://doi.org/10.3390/app15084226.

Full text

Abstract:

This study sought to establish machine learning models for forecasting in-hospital mortality in non-ST-segment elevation myocardial infarction (NSTEMI) patients, and focused on model interpretability using Shapley additive explanations (SHAP). Data were gathered from the Medical Information Mart for Intensive Care—IV database. The synthetic minority over-sampling technique and Edited Nearest Neighbors were used to address class imbalance. Four machine learning algorithms were employed, including Adaptive Boosting (AdaBoost), Random Forest (RF), Gradient Boosting Decision Trees (GBDT), and eXtreme Gradient Boosting (XGBoost). SHAP was utilized to improve transparency and credibility. The all-features RF model demonstrated optimal performance, with an accuracy of 0.8513, precision of 0.9016, and AUC of 0.8903. The SHAP summary plot for the RF model revealed that Acute Physiology Score III, lactate dehydrogenase, and lactate were the three most crucial characteristics, with higher values indicating a greater risk. The study demonstrates the applicability of machine learning, particularly RF, in predicting in-hospital mortality for NSTEMI patients, with the use of SHAP enhancing model interpretability and providing clinicians with clearer insights into feature contributions.

APA, Harvard, Vancouver, ISO, and other styles

49

Hartati, Hartati, Rudy Herteno, Mohammad Reza Faisal, Fatma Indriani, and Friska Abadi. "Recursive Feature Elimination Optimization Using Shapley Additive Explanations in Software Defect Prediction with LightGBM Classification." JURNAL INFOTEL 17, no. 1 (2025): 1–16. https://doi.org/10.20895/infotel.v17i1.1159.

Full text

Abstract:

Software defect refers to issues where the software does not function properly. The mistakes in the software development process are the reasons for software defects. Software defect prediction is performed to ensure the software is defect-free. Machine learning classification is used to classify defects in software. To improve the classification model, it is necessary to select the best features from the dataset. Recursive Feature Elimination (RFE) is a feature selection method. Shapley Additive Explanations (SHAP) is a method that can optimize feature selection algorithms to produce better results. In this research, the popular boosting algorithm LightGBM will be selected as a classifier to predict software defects. Meanwhile, RFE-SHAP will be used for feature selection to identify the best subset of features. The results and discussion show that RFE-SHAP feature selection slightly outperforms RFE, with average AUC values of 0.864 and 0.858, respectively. Moreover, RFE-SHAP produces more significant results in feature selection compared to RFE. The RFE feature selection T-Test results are Pvalue = 0.039 < α = 0.05 and tcount = 3.011 > ttable = 2.776. On the contrary, the RFE-SHAP feature selection T-Test results are Pvalue = 0.000 < α = 0.05 and tcount = 11.91 > ttable = 2.776.

APA, Harvard, Vancouver, ISO, and other styles

50

Thanathamathee, Putthiporn, Siriporn Sawangarreerak, Siripinyo Chantamunee, and Dinna Nina Mohd Nizam. "SHAP-Instance Weighted and Anchor Explainable AI: Enhancing XGBoost for Financial Fraud Detection." Emerging Science Journal 8, no. 6 (2024): 2404–30. https://doi.org/10.28991/esj-2024-08-06-016.

Full text

Abstract:

This research aims to enhance financial fraud detection by integrating SHAP-Instance Weighting and Anchor Explainable AI with XGBoost, addressing challenges of class imbalance and model interpretability. The study extends SHAP values beyond feature importance to instance weighting, assigning higher weights to more influential instances. This focuses model learning on critical samples. It combines this with Anchor Explainable AI to generate interpretable if-then rules explaining model decisions. The approach is applied to a dataset of financial statements from the listed companies on the Stock Exchange of Thailand. The method significantly improves fraud detection performance, achieving perfect recall for fraudulent instances and substantial gains in accuracy while maintaining high precision. It effectively differentiates between non-fraudulent, fraudulent, and grey area cases. The generated rules provide transparent insights into model decisions, offering nuanced guidance for risk management and compliance. This research introduces instance weighting based on SHAP values as a novel concept in financial fraud detection. By simultaneously addressing class imbalance and interpretability, the integrated approach outperforms traditional methods and sets a new standard in the field. It provides a robust, explainable solution that reduces false positives and increases trust in fraud detection models. Doi: 10.28991/ESJ-2024-08-06-016 Full Text: PDF

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!