Journal articles: 'Shapley Additive Explanations'

1

Vega García, María, and José L. Aznarte. "Shapley additive explanations for NO2 forecasting." Ecological Informatics 56 (March 2020): 101039. http://dx.doi.org/10.1016/j.ecoinf.2019.101039.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Antwarg, Liat, Ronnie Mindlin Miller, Bracha Shapira, and Lior Rokach. "Explaining anomalies detected by autoencoders using Shapley Additive Explanations." Expert Systems with Applications 186 (December 2021): 115736. http://dx.doi.org/10.1016/j.eswa.2021.115736.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Ogami, Chika, Yasuhiro Tsuji, Hiroto Seki, Hideaki Kawano, Hideto To, Yoshiaki Matsumoto, and Hiroyuki Hosono. "An artificial neural network−pharmacokinetic model and its interpretation using Shapley additive explanations." CPT: Pharmacometrics & Systems Pharmacology 10, no. 7 (May 27, 2021): 760–68. http://dx.doi.org/10.1002/psp4.12643.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Tideman, Leonoor E. M., Lukasz G. Migas, Katerina V. Djambazova, Nathan Heath Patterson, Richard M. Caprioli, Jeffrey M. Spraggins, and Raf Van de Plas. "Automated biomarker candidate discovery in imaging mass spectrometry data through spatially localized Shapley additive explanations." Analytica Chimica Acta 1177 (September 2021): 338522. http://dx.doi.org/10.1016/j.aca.2021.338522.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Wieland, Ralf, Tobia Lakes, and Claas Nendel. "Using Shapley additive explanations to interpret extreme gradient boosting predictions of grassland degradation in Xilingol, China." Geoscientific Model Development 14, no. 3 (March 16, 2021): 1493–510. http://dx.doi.org/10.5194/gmd-14-1493-2021.

Full text

Abstract:

Abstract. Machine learning (ML) and data-driven approaches are increasingly used in many research areas. Extreme gradient boosting (XGBoost) is a tree boosting method that has evolved into a state-of-the-art approach for many ML challenges. However, it has rarely been used in simulations of land use change so far. Xilingol, a typical region for research on serious grassland degradation and its drivers, was selected as a case study to test whether XGBoost can provide alternative insights that conventional land-use models are unable to generate. A set of 20 drivers was analysed using XGBoost, involving four alternative sampling strategies, and SHAP (Shapley additive explanations) to interpret the results of the purely data-driven approach. The results indicated that, with three of the sampling strategies (over-balanced, balanced, and imbalanced), XGBoost achieved similar and robust simulation results. SHAP values were useful for analysing the complex relationship between the different drivers of grassland degradation. Four drivers accounted for 99 % of the grassland degradation dynamics in Xilingol. These four drivers were spatially allocated, and a risk map of further degradation was produced. The limitations of using XGBoost to predict future land-use change are discussed.

APA, Harvard, Vancouver, ISO, and other styles

6

Knapič, Samanta, Avleen Malhi, Rohit Saluja, and Kary Främling. "Explainable Artificial Intelligence for Human Decision Support System in the Medical Domain." Machine Learning and Knowledge Extraction 3, no. 3 (September 19, 2021): 740–70. http://dx.doi.org/10.3390/make3030037.

Full text

Abstract:

In this paper, we present the potential of Explainable Artificial Intelligence methods for decision support in medical image analysis scenarios. Using three types of explainable methods applied to the same medical image data set, we aimed to improve the comprehensibility of the decisions provided by the Convolutional Neural Network (CNN). In vivo gastral images obtained by a video capsule endoscopy (VCE) were the subject of visual explanations, with the goal of increasing health professionals’ trust in black-box predictions. We implemented two post hoc interpretable machine learning methods, called Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP), and an alternative explanation approach, the Contextual Importance and Utility (CIU) method. The produced explanations were assessed by human evaluation. We conducted three user studies based on explanations provided by LIME, SHAP and CIU. Users from different non-medical backgrounds carried out a series of tests in a web-based survey setting and stated their experience and understanding of the given explanations. Three user groups (n = 20, 20, 20) with three distinct forms of explanations were quantitatively analyzed. We found that, as hypothesized, the CIU-explainable method performed better than both LIME and SHAP methods in terms of improving support for human decision-making and being more transparent and thus understandable to users. Additionally, CIU outperformed LIME and SHAP by generating explanations more rapidly. Our findings suggest that there are notable differences in human decision-making between various explanation support settings. In line with that, we present three potential explainable methods that, with future improvements in implementation, can be generalized to different medical data sets and can provide effective decision support to medical experts.

APA, Harvard, Vancouver, ISO, and other styles

7

Mangalathu, Sujith, Seong-Hoon Hwang, and Jong-Su Jeon. "Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach." Engineering Structures 219 (September 2020): 110927. http://dx.doi.org/10.1016/j.engstruct.2020.110927.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Pokharel, Sugam, Pradip Sah, and Deepak Ganta. "Improved Prediction of Total Energy Consumption and Feature Analysis in Electric Vehicles Using Machine Learning and Shapley Additive Explanations Method." World Electric Vehicle Journal 12, no. 3 (June 29, 2021): 94. http://dx.doi.org/10.3390/wevj12030094.

Full text

Abstract:

Electric vehicles (EVs) have emerged as the green energy alternative for conventional vehicles. While various governments promote EVs, people feel “range anxiety” because of their limited driving range or charge capacity. A limited number of charging stations are available, which results in a strong demand for predicting energy consumed by EVs. In this paper, machine learning (ML) models such as multiple linear regression (MLR), extreme gradient boosting (XGBoost), and support vector regression (SVR) were used to investigate the total energy consumption (TEC) by the EVs. The independent variables used for the study include changing real-life situations or external parameters, such as trip distance, tire type, driving style, power, odometer reading, EV model, city, motorway, country roads, air conditioning, and park heating. We compared the ML models’ performance along with the error analysis. A pairwise correlation study showed that trip distance has a high correlation coefficient (0.87) with TEC. XGBoost had better prediction accuracy (~92%) or R2 (0.92). Trip distance, power, heating, and odometer reading were the most important features influencing the TEC, identified using the shapley additive explanations method.

APA, Harvard, Vancouver, ISO, and other styles

9

Machado Poletti Valle, Luis Fernando, Camille Avestruz, David J. Barnes, Arya Farahi, Erwin T. Lau, and Daisuke Nagai. "shaping the gas: understanding gas shapes in dark matter haloes with interpretable machine learning." Monthly Notices of the Royal Astronomical Society 507, no. 1 (August 6, 2021): 1468–84. http://dx.doi.org/10.1093/mnras/stab2252.

Full text

Abstract:

ABSTRACT The non-spherical shapes of dark matter and gas distributions introduce systematic uncertainties that affect observable–mass relations and selection functions of galaxy groups and clusters. However, the triaxial gas distributions depend on the non-linear physical processes of halo formation histories and baryonic physics, which are challenging to model accurately. In this study, we explore a machine learning approach for modelling the dependence of gas shapes on dark matter and baryonic properties. With data from the IllustrisTNG hydrodynamical cosmological simulations, we develop a machine learning pipeline that applies XGBoost, an implementation of gradient-boosted decision trees, to predict radial profiles of gas shapes from halo properties. We show that XGBoost models can accurately predict gas shape profiles in dark matter haloes. We also explore model interpretability with the SHapley Additive exPlanations (shap), a method that identifies the most predictive properties at different halo radii. We find that baryonic properties best predict gas shapes in halo cores, whereas dark matter shapes are the main predictors in the halo outskirts. This work demonstrates the power of interpretable machine learning in modelling observable properties of dark matter haloes in the era of multiwavelength cosmological surveys.

APA, Harvard, Vancouver, ISO, and other styles

10

Manikis, Georgios C., Georgios S. Ioannidis, Loizos Siakallis, Katerina Nikiforaki, Michael Iv, Diana Vozlic, Katarina Surlan-Popovic, Max Wintermark, Sotirios Bisdas, and Kostas Marias. "Multicenter DSC–MRI-Based Radiomics Predict IDH Mutation in Gliomas." Cancers 13, no. 16 (August 5, 2021): 3965. http://dx.doi.org/10.3390/cancers13163965.

Full text

Abstract:

To address the current lack of dynamic susceptibility contrast magnetic resonance imaging (DSC–MRI)-based radiomics to predict isocitrate dehydrogenase (IDH) mutations in gliomas, we present a multicenter study that featured an independent exploratory set for radiomics model development and external validation using two independent cohorts. The maximum performance of the IDH mutation status prediction on the validation set had an accuracy of 0.544 (Cohen’s kappa: 0.145, F1-score: 0.415, area under the curve-AUC: 0.639, sensitivity: 0.733, specificity: 0.491), which significantly improved to an accuracy of 0.706 (Cohen’s kappa: 0.282, F1-score: 0.474, AUC: 0.667, sensitivity: 0.6, specificity: 0.736) when dynamic-based standardization of the images was performed prior to the radiomics. Model explainability using local interpretable model-agnostic explanations (LIME) and Shapley additive explanations (SHAP) revealed potential intuitive correlations between the IDH–wildtype increased heterogeneity and the texture complexity. These results strengthened our hypothesis that DSC–MRI radiogenomics in gliomas hold the potential to provide increased predictive performance from models that generalize well and provide understandable patterns between IDH mutation status and the extracted features toward enabling the clinical translation of radiogenomics in neuro-oncology.

APA, Harvard, Vancouver, ISO, and other styles

11

He, Jian, Yong Hao, and Xiaoqiong Wang. "An Interpretable Aid Decision-Making Model for Flag State Control Ship Detention Based on SMOTE and XGBoost." Journal of Marine Science and Engineering 9, no. 2 (February 4, 2021): 156. http://dx.doi.org/10.3390/jmse9020156.

Full text

Abstract:

The reasonable decision of ship detention plays a vital role in flag state control (FSC). Machine learning algorithms can be applied as aid tools for identifying ship detention. In this study, we propose a novel interpretable ship detention decision-making model based on machine learning, termed SMOTE-XGBoost-Ship detention model (SMO-XGB-SD), using the extreme gradient boosting (XGBoost) algorithm and the synthetic minority oversampling technique (SMOTE) algorithm to identify whether a ship should be detained. Our verification results show that the SMO-XGB-SD algorithm outperforms random forest (RF), support vector machine (SVM), and logistic regression (LR) algorithm. In addition, the new algorithm also provides a reasonable interpretation of model performance and highlights the most important features for identifying ship detention using the Shapley additive explanations (SHAP) algorithm. The SMO-XGB-SD model provides an effective basis for aiding decisions on ship detention by inland flag state control officers (FSCOs) and the ship safety management of ship operating companies, as well as training services for new FSCOs in maritime organizations.

APA, Harvard, Vancouver, ISO, and other styles

12

Nadaf, Ali, Sebas Eliëns, and Xin Miao. "Interpretable-Machine-Learning Evidence for Importance and Optimum of Learning Time." International Journal of Information and Education Technology 11, no. 10 (2021): 444–49. http://dx.doi.org/10.18178/ijiet.2021.11.10.1548.

Full text

Abstract:

This study uses a machine learning technique, a boosted tree model, to relate the student cognitive achievement in the 2018 data from the Programme of International Student Assessment (PISA) to other features related to the student learning process, capturing the complex and nonlinear relationships in the data. The SHapley Additive exPlanations (SHAP) approach is subsequently used to explain the complexity of the model. It reveals the relative importance of each of the features in predicting cognitive achievement. We find that instruction time comes out as an important predictor, but with a nonlinear relationship between its value and the contribution to the prediction. We find that a large weekly learning time of more than 35 hours is associated with less positive or even negative effect on the predicted outcome. We discuss how this method can possibly be used to signal problems in the student population related to learning time or other features.

APA, Harvard, Vancouver, ISO, and other styles

13

Carlsson, Leo S., Peter B. Samuelsson, and Pär G. Jönsson. "Modeling the Effect of Scrap on the Electrical Energy Consumption of an Electric Arc Furnace." Processes 8, no. 9 (August 26, 2020): 1044. http://dx.doi.org/10.3390/pr8091044.

Full text

Abstract:

The melting time of scrap is a factor that affects the Electrical Energy (EE) consumption of the Electric Arc Furnace (EAF) process. The EE consumption itself stands for most of the total energy consumption during the process. Three distinct representations of scrap, based partly on the apparent density and shape of scrap, were created to investigate the effect of scrap on the accuracy of a statistical model predicting the EE consumption of an EAF. Shapley Additive Explanations (SHAP) was used as a tool to investigate the effects by each scrap category on each prediction of a selected model. The scrap representation based on the shape of scrap consistently resulted in the best performing models while all models using any of the scrap representations performed better than the ones without any scrap representation. These results were consistent for all four distinct and separately used cleaning strategies on the data set governing the models. In addition, some of the main scrap categories contributed to the model prediction of EE in accordance with the expectations and experience of the plant engineers. The results provide significant evidence that a well-chosen scrap categorization is important to improve a statistical model predicting the EE and that experience on the specific EAF under study is essential to evaluate the practical usefulness of the model.

APA, Harvard, Vancouver, ISO, and other styles

14

Merembayev, Timur, Darkhan Kurmangaliyev, Bakhbergen Bekbauov, and Yerlan Amanbek. "A Comparison of Machine Learning Algorithms in Predicting Lithofacies: Case Studies from Norway and Kazakhstan." Energies 14, no. 7 (March 29, 2021): 1896. http://dx.doi.org/10.3390/en14071896.

Full text

Abstract:

Defining distinctive areas of the physical properties of rocks plays an important role in reservoir evaluation and hydrocarbon production as core data are challenging to obtain from all wells. In this work, we study the evaluation of lithofacies values using the machine learning algorithms in the determination of classification from various well log data of Kazakhstan and Norway. We also use the wavelet-transformed data in machine learning algorithms to identify geological properties from the well log data. Numerical results are presented for the multiple oil and gas reservoir data which contain more than 90 released wells from Norway and 10 wells from the Kazakhstan field. We have compared the the machine learning algorithms including KNN, Decision Tree, Random Forest, XGBoost, and LightGBM. The evaluation of the model score is conducted by using metrics such as accuracy, Hamming loss, and penalty matrix. In addition, the influence of the dataset features on the prediction is investigated using the machine learning algorithms. The result of research shows that the Random Forest model has the best score among considered algorithms. In addition, the results are consistent with outcome of the SHapley Additive exPlanations (SHAP) framework.

APA, Harvard, Vancouver, ISO, and other styles

15

Kumar, Akshi, Shubham Dikshit, and Victor Hugo C. Albuquerque. "Explainable Artificial Intelligence for Sarcasm Detection in Dialogues." Wireless Communications and Mobile Computing 2021 (July 2, 2021): 1–13. http://dx.doi.org/10.1155/2021/2939334.

Full text

Abstract:

Sarcasm detection in dialogues has been gaining popularity among natural language processing (NLP) researchers with the increased use of conversational threads on social media. Capturing the knowledge of the domain of discourse, context propagation during the course of dialogue, and situational context and tone of the speaker are some important features to train the machine learning models for detecting sarcasm in real time. As situational comedies vibrantly represent human mannerism and behaviour in everyday real-life situations, this research demonstrates the use of an ensemble supervised learning algorithm to detect sarcasm in the benchmark dialogue dataset, MUStARD. The punch-line utterance and its associated context are taken as features to train the eXtreme Gradient Boosting (XGBoost) method. The primary goal is to predict sarcasm in each utterance of the speaker using the chronological nature of a scene. Further, it is vital to prevent model bias and help decision makers understand how to use the models in the right way. Therefore, as a twin goal of this research, we make the learning model used for conversational sarcasm detection interpretable. This is done using two post hoc interpretability approaches, Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive exPlanations (SHAP), to generate explanations for the output of a trained classifier. The classification results clearly depict the importance of capturing the intersentence context to detect sarcasm in conversational threads. The interpretability methods show the words (features) that influence the decision of the model the most and help the user understand how the model is making the decision for detecting sarcasm in dialogues.

APA, Harvard, Vancouver, ISO, and other styles

16

Ponn, Thomas, Thomas Kröger, and Frank Diermeyer. "Identification and Explanation of Challenging Conditions for Camera-Based Object Detection of Automated Vehicles." Sensors 20, no. 13 (July 1, 2020): 3699. http://dx.doi.org/10.3390/s20133699.

Full text

Abstract:

For a safe market launch of automated vehicles, the risks of the overall system as well as the sub-components must be efficiently identified and evaluated. This also includes camera-based object detection using artificial intelligence algorithms. It is trivial and explainable that due to the principle of the camera, performance depends highly on the environmental conditions and can be poor, for example in heavy fog. However, there are other factors influencing the performance of camera-based object detection, which will be comprehensively investigated for the first time in this paper. Furthermore, a precise modeling of the detection performance and the explanation of individual detection results is not possible due to the artificial intelligence based algorithms used. Therefore, a modeling approach based on the investigated influence factors is proposed and the newly developed SHapley Additive exPlanations (SHAP) approach is adopted to analyze and explain the detection performance of different object detection algorithms. The results show that many influence factors such as the relative rotation of an object towards the camera or the position of an object on the image have basically the same influence on the detection performance regardless of the detection algorithm used. In particular, the revealed weaknesses of the tested object detectors can be used to derive challenging and critical scenarios for the testing and type approval of automated vehicles.

APA, Harvard, Vancouver, ISO, and other styles

17

Oh, Sejong, Yuli Park, Kyong Jin Cho, and Seong Jae Kim. "Explainable Machine Learning Model for Glaucoma Diagnosis and Its Interpretation." Diagnostics 11, no. 3 (March 13, 2021): 510. http://dx.doi.org/10.3390/diagnostics11030510.

Full text

Abstract:

The aim is to develop a machine learning prediction model for the diagnosis of glaucoma and an explanation system for a specific prediction. Clinical data of the patients based on a visual field test, a retinal nerve fiber layer optical coherence tomography (RNFL OCT) test, a general examination including an intraocular pressure (IOP) measurement, and fundus photography were provided for the feature selection process. Five selected features (variables) were used to develop a machine learning prediction model. The support vector machine, C5.0, random forest, and XGboost algorithms were tested for the prediction model. The performance of the prediction models was tested with 10-fold cross-validation. Statistical charts, such as gauge, radar, and Shapley Additive Explanations (SHAP), were used to explain the prediction case. All four models achieved similarly high diagnostic performance, with accuracy values ranging from 0.903 to 0.947. The XGboost model is the best model with an accuracy of 0.947, sensitivity of 0.941, specificity of 0.950, and AUC of 0.945. Three statistical charts were established to explain the prediction based on the characteristics of the XGboost model. Higher diagnostic performance was achieved with the XGboost model. These three statistical charts can help us understand why the machine learning model produces a specific prediction result. This may be the first attempt to apply “explainable artificial intelligence” to eye disease diagnosis.

APA, Harvard, Vancouver, ISO, and other styles

18

Xie, Yibing, Nichakorn Pongsakornsathien, Alessandro Gardi, and Roberto Sabatini. "Explanation of Machine-Learning Solutions in Air-Traffic Management." Aerospace 8, no. 8 (August 12, 2021): 224. http://dx.doi.org/10.3390/aerospace8080224.

Full text

Abstract:

Advances in the trusted autonomy of air-traffic management (ATM) systems are currently being pursued to cope with the predicted growth in air-traffic densities in all classes of airspace. Highly automated ATM systems relying on artificial intelligence (AI) algorithms for anomaly detection, pattern identification, accurate inference, and optimal conflict resolution are technically feasible and demonstrably able to take on a wide variety of tasks currently accomplished by humans. However, the opaqueness and inexplicability of most intelligent algorithms restrict the usability of such technology. Consequently, AI-based ATM decision-support systems (DSS) are foreseen to integrate eXplainable AI (XAI) in order to increase interpretability and transparency of the system reasoning and, consequently, build the human operators’ trust in these systems. This research presents a viable solution to implement XAI in ATM DSS, providing explanations that can be appraised and analysed by the human air-traffic control operator (ATCO). The maturity of XAI approaches and their application in ATM operational risk prediction is investigated in this paper, which can support both existing ATM advisory services in uncontrolled airspace (Classes E and F) and also drive the inflation of avoidance volumes in emerging performance-driven autonomy concepts. In particular, aviation occurrences and meteorological databases are exploited to train a machine learning (ML)-based risk-prediction tool capable of real-time situation analysis and operational risk monitoring. The proposed approach is based on the XGBoost library, which is a gradient-boost decision tree algorithm for which post-hoc explanations are produced by SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME). Results are presented and discussed, and considerations are made on the most promising strategies for evolving the human–machine interactions (HMI) to strengthen the mutual trust between ATCO and systems. The presented approach is not limited only to conventional applications but also suitable for UAS-traffic management (UTM) and other emerging applications.

APA, Harvard, Vancouver, ISO, and other styles

19

Antoniadi, Anna Markella, Miriam Galvin, Mark Heverin, Orla Hardiman, and Catherine Mooney. "Prediction of quality of life in people with ALS." ACM SIGAPP Applied Computing Review 21, no. 2 (June 2021): 5–17. http://dx.doi.org/10.1145/3477127.3477128.

Full text

Abstract:

Amyotrophic Lateral Sclerosis (ALS) is a rare neurodegenerative disease that causes a rapid decline in motor functions and has a fatal trajectory. ALS is currently incurable, so the aim of the treatment is mostly to alleviate symptoms and improve quality of life (QoL) for the patients. The goal of this study is to develop a Clinical Decision Support System (CDSS) to alert clinicians when a patient is at risk of experiencing low QoL. The source of data was the Irish ALS Registry and interviews with the 90 patients and their primary informal caregiver at three time-points. In this dataset, there were two different scores to measure a person's overall QoL, based on the McGill QoL (MQoL) Questionnaire and we worked towards the prediction of both. We used Extreme Gradient Boosting (XGBoost) for the development of the predictive models, which was compared to a logistic regression baseline model. Additionally, we used Synthetic Minority Over-sampling Technique (SMOTE) to examine if that would increase model performance and SHAP (SHapley Additive explanations) as a technique to provide local and global explanations to the outputs as well as to select the most important features. The total calculated MQoL score was predicted accurately using three features - age at disease onset, ALSFRS-R score for orthopnoea and the caregiver's status pre-caregiving - with a F1-score on the test set equal to 0.81, recall of 0.78, and precision of 0.84. The addition of two extra features (caregiver's age and the ALSFRS-R score for speech) produced similar outcomes (F1-score 0.79, recall 0.70 and precision 0.90).

APA, Harvard, Vancouver, ISO, and other styles

20

Furgała-Selezniow, Grażyna, Małgorzata Jankun-Woźnicka, Marek Kruk, and Aneta A. Omelan. "Land Use and Land Cover Pattern as a Measure of Tourism Impact on a Lakeshore Zone." Land 10, no. 8 (July 27, 2021): 787. http://dx.doi.org/10.3390/land10080787.

Full text

Abstract:

Lakes provide different ecosystem services, including those related to tourism and recreation. Sustainable development principles should be respected in lake tourism planning. The aim of this study was to assess the impact of tourism on the lakeshore zone in a typical post-glacial Lakeland in Northern Poland (Central Europe). An explanatory analysis of the distribution of individual spatial factor values was performed using the SHapley Additive exPlanations algorithm (SHAP). In a first step, the aim was to select a Machine Learning model for modelling based on Shapley values. The greater or lesser influence of a given factor on the tourism function was measured for individual lakes. The final results of ensemble modelling and SHAP were obtained by averaging the results of five random repetitions of the execution of these models. The impact of tourism on the lakeshore zone can be much more accurately determined using an indirect method, by analysing the tourism and recreational infrastructure constantly present there. The values of the indices proposed in the study provide indirect information on the number of tourists using the tourist and recreational facilities and are a measure of the impact of tourism on the lakeshore zone. The developed methodology can be applied to the majority of post-glacial lakes in Europe and other regions of the world in order to monitor the threats resulting from shore zone exploitation. Such studies can be an appropriate tool for management and planning by the relevant authorities.

APA, Harvard, Vancouver, ISO, and other styles

21

Kulaga, Anton Y., Eugen Ursu, Dmitri Toren, Vladyslava Tyshchenko, Rodrigo Guinea, Malvina Pushkova, Vadim E. Fraifeld, and Robi Tacutu. "Machine Learning Analysis of Longevity-Associated Gene Expression Landscapes in Mammals." International Journal of Molecular Sciences 22, no. 3 (January 22, 2021): 1073. http://dx.doi.org/10.3390/ijms22031073.

Full text

Abstract:

One of the important questions in aging research is how differences in transcriptomics are associated with the longevity of various species. Unfortunately, at the level of individual genes, the links between expression in different organs and maximum lifespan (MLS) are yet to be fully understood. Analyses are complicated further by the fact that MLS is highly associated with other confounding factors (metabolic rate, gestation period, body mass, etc.) and that linear models may be limiting. Using gene expression from 41 mammalian species, across five organs, we constructed gene-centric regression models associating gene expression with MLS and other species traits. Additionally, we used SHapley Additive exPlanations and Bayesian networks to investigate the non-linear nature of the interrelations between the genes predicted to be determinants of species MLS. Our results revealed that expression patterns correlate with MLS, some across organs, and others in an organ-specific manner. The combination of methods employed revealed gene signatures formed by only a few genes that are highly predictive towards MLS, which could be used to identify novel longevity regulator candidates in mammals.

APA, Harvard, Vancouver, ISO, and other styles

22

Nor, Ahmad Kamal Mohd. "Failure Prognostic of Turbofan Engines with Uncertainty Quantification and Explainable AI (XIA)." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 3 (April 11, 2021): 3494–504. http://dx.doi.org/10.17762/turcomat.v12i3.1624.

Full text

Abstract:

Deep learning is quickly becoming essential to human ecosystem. However, the opacity of certain deep learning models poses a legal barrier in its adoption for greater purposes. Explainable AI (XAI) is a recent paradigm intended to tackle this issue. It explains the prediction mechanism produced by black box AI models, making it extremely practical for safety, security or financially important decision making. In another aspect, most deep learning studies are based on point estimate prediction with no measure of uncertainty which is vital for decision making. Obviously, these works are not suitable for real world applications. This paper presents a Remaining Useful Life (RUL) estimation problem for turbofan engines equipped with prognostic explainability and uncertainty quantification. A single input, multi outputs probabilistic Long Short-Term Memory (LSTM) is employed to predict the RULs distribution of the turbofans and SHapley Additive exPlanations (SHAP) approach is applied to explain the prognostic made. The explainable probabilistic LSTM is thus able to express its confidence in predicting and explains the produced estimation. The performance of the proposed method is comparable to several other published works

APA, Harvard, Vancouver, ISO, and other styles

23

Kim, Donghyun, Gian Antariksa, Melia Putri Handayani, Sangbong Lee, and Jihwan Lee. "Explainable Anomaly Detection Framework for Maritime Main Engine Sensor Data." Sensors 21, no. 15 (July 31, 2021): 5200. http://dx.doi.org/10.3390/s21155200.

Full text

Abstract:

In this study, we proposed a data-driven approach to the condition monitoring of the marine engine. Although several unsupervised methods in the maritime industry have existed, the common limitation was the interpretation of the anomaly; they do not explain why the model classifies specific data instances as an anomaly. This study combines explainable AI techniques with anomaly detection algorithm to overcome the limitation above. As an explainable AI method, this study adopts Shapley Additive exPlanations (SHAP), which is theoretically solid and compatible with any kind of machine learning algorithm. SHAP enables us to measure the marginal contribution of each sensor variable to an anomaly. Thus, one can easily specify which sensor is responsible for the specific anomaly. To illustrate our framework, the actual sensor stream obtained from the cargo vessel collected over 10 months was analyzed. In this analysis, we performed hierarchical clustering analysis with transformed SHAP values to interpret and group common anomaly patterns. We showed that anomaly interpretation and segmentation using SHAP value provides more useful interpretation compared to the case without using SHAP value.

APA, Harvard, Vancouver, ISO, and other styles

24

Ren, Junyu, Benyu Li, Ming Zhao, Hengchu Shi, Hao You, and Jinfu Chen. "Optimization for Data-Driven Preventive Control Using Model Interpretation and Augmented Dataset." Energies 14, no. 12 (June 10, 2021): 3430. http://dx.doi.org/10.3390/en14123430.

Full text

Abstract:

Transient stability preventive control (TSPC) ensures that power systems have a sufficient stability margin by adjusting power flow before faults occur. The generation of TSPC measures requires accuracy and efficiency. In this paper, a novel model interpretation-based multi-fault coordinated data-driven preventive control optimization strategy is proposed. First, an augmented dataset covering the fault information is constructed, enabling the transient stability assessment (TSA) model to discriminate the system stability under different fault scenarios. Then, the adaptive synthetic sampling (ADASYN) method is implemented to deal with the imbalanced instances of power systems. Next, an instance-based machine model interpretation tool, Shapley additive explanations (SHAP), is embedded to explain the TSA model’s predictions and to find out the most effective control objects, thus narrowing the number of control objects. Finally, differential evolution is deployed to optimize the generation of TSPC measures, taking into account the security and economy of TSPC. The proposed method’s efficiency and robustness are verified on the New England 39-bus system and the IEEE 54-machine 118-bus system.

APA, Harvard, Vancouver, ISO, and other styles

25

Guo, Manze, Zhenzhou Yuan, Bruce Janson, Yongxin Peng, Yang Yang, and Wencheng Wang. "Older Pedestrian Traffic Crashes Severity Analysis Based on an Emerging Machine Learning XGBoost." Sustainability 13, no. 2 (January 18, 2021): 926. http://dx.doi.org/10.3390/su13020926.

Full text

Abstract:

Older pedestrians are vulnerable on the streets and at significant risk of injury or death when involved in crashes. Pedestrians’ safety is critical for roadway agencies to consider and improve, especially older pedestrians aged greater than 65 years old. To better protect the older pedestrian group, the factors that contribute to the older crashes need to be analyzed deeply. Traditional modeling approaches such as Logistic models for data analysis may lead to modeling distortions due to the independence assumptions. In this study, Extreme Gradient Boosting (XGBoost), is used to model the classification problem of three different levels of severity of older pedestrian traffic crashes from crash data in Colorado, US. Further, Shapley Additive explanations (SHAP) are implemented to interpret the XGBoost model result and analyze each feature’s importance related to the levels of older pedestrian crashes. The interpretation results show that the driver characteristic, older pedestrian characteristics, and vehicle movement are the most important factors influencing the probability of the three different severity levels. Those results investigate each severity level’s correlation factors, which can inform the department of traffic management and the department of road infrastructure to protect older pedestrians by controlling or managing some of those significant features.

APA, Harvard, Vancouver, ISO, and other styles

26

Stender, Merten, Mathies Wedler, Norbert Hoffmann, and Christian Adams. "Explainable machine learning: A case study on impedance tube measurements." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 263, no. 3 (August 1, 2021): 3223–34. http://dx.doi.org/10.3397/in-2021-2342.

Full text

Abstract:

Machine learning (ML) techniques allow for finding hidden patterns and signatures in data. Currently, these methods are gaining increased interest in engineering in general and in vibroacoustics in particular. Although ML methods are successfully applied, it is hardly understood how these black box-type methods make their decisions. Explainable machine learning aims at overcoming this issue by deepening the understanding of the decision-making process through perturbation-based model diagnosis. This paper introduces machine learning methods and reviews recent techniques for explainability and interpretability. These methods are exemplified on sound absorption coefficient spectra of one sound absorbing foam material measured in an impedance tube. Variances of the absorption coefficient measurements as a function of the specimen thickness and the operator are modeled by univariate and multivariate machine learning models. In order to identify the driving patterns, i.e. how and in which frequency regime the measurements are affected by the setup specifications, Shapley additive explanations are derived for the ML models. It is demonstrated how explaining machine learning models can be used to discover and express complicated relations in experimental data, thereby paving the way to novel knowledge discovery strategies in evidence-based modeling.

APA, Harvard, Vancouver, ISO, and other styles

27

Chen, Hengrui, Hong Chen, Ruiyu Zhou, Zhizhen Liu, and Xiaoke Sun. "Exploring the Mechanism of Crashes with Autonomous Vehicles Using Machine Learning." Mathematical Problems in Engineering 2021 (February 26, 2021): 1–10. http://dx.doi.org/10.1155/2021/5524356.

Full text

Abstract:

The safety issue has become a critical obstacle that cannot be ignored in the marketization of autonomous vehicles (AVs). The objective of this study is to explore the mechanism of AV-involved crashes and analyze the impact of each feature on crash severity. We use the Apriori algorithm to explore the causal relationship between multiple factors to explore the mechanism of crashes. We use various machine learning models, including support vector machine (SVM), classification and regression tree (CART), and eXtreme Gradient Boosting (XGBoost), to analyze the crash severity. Besides, we apply the Shapley Additive Explanations (SHAP) to interpret the importance of each factor. The results indicate that XGBoost obtains the best result (recall = 75%; G-mean = 67.82%). Both XGBoost and Apriori algorithm effectively provided meaningful insights about AV-involved crash characteristics and their relationship. Among all these features, vehicle damage, weather conditions, accident location, and driving mode are the most critical features. We found that most rear-end crashes are conventional vehicles bumping into the rear of AVs. Drivers should be extremely cautious when driving in fog, snow, and insufficient light. Besides, drivers should be careful when driving near intersections, especially in the autonomous driving mode.

APA, Harvard, Vancouver, ISO, and other styles

28

Jeon, Junhyub, Namhyuk Seo, Seung Bae Son, Seok-Jae Lee, and Minsu Jung. "Application of Machine Learning Algorithms and SHAP for Prediction and Feature Analysis of Tempered Martensite Hardness in Low-Alloy Steels." Metals 11, no. 8 (July 22, 2021): 1159. http://dx.doi.org/10.3390/met11081159.

Full text

Abstract:

The tempering of low-alloy steels is important for controlling the mechanical properties required for industrial fields. Several studies have investigated the relationships between the input and target values of materials using machine learning algorithms. The limitation of machine learning algorithms is that the mechanism of how the input values affect the output has yet to be confirmed despite numerous case studies. To address this issue, we trained four machine learning algorithms to control the hardness of low-alloy steels under various tempering conditions. The models were trained using the tempering temperature, holding time, and composition of the alloy as the inputs. The input data were drawn from a database of more than 1900 experimental datasets for low-alloy steels created from the relevant literature. We selected the random forest regression (RFR) model to analyze its mechanism and the importance of the input values using Shapley additive explanations (SHAP). The prediction accuracy of the RFR for the tempered martensite hardness was better than that of the empirical equation. The tempering temperature is the most important feature for controlling the hardness, followed by the C content, the holding time, and the Cr, Si, Mn, Mo, and Ni contents.

APA, Harvard, Vancouver, ISO, and other styles

29

Tamura, Shunsuke, Swarit Jasial, Tomoyuki Miyao, and Kimito Funatsu. "Interpretation of Ligand-Based Activity Cliff Prediction Models Using the Matched Molecular Pair Kernel." Molecules 26, no. 16 (August 13, 2021): 4916. http://dx.doi.org/10.3390/molecules26164916.

Full text

Abstract:

Activity cliffs (ACs) are formed by two structurally similar compounds with a large difference in potency. Accurate AC prediction is expected to help researchers’ decisions in the early stages of drug discovery. Previously, predictive models based on matched molecular pair (MMP) cliffs have been proposed. However, the proposed methods face a challenge of interpretability due to the black-box character of the predictive models. In this study, we developed interpretable MMP fingerprints and modified a model-specific interpretation approach for models based on a support vector machine (SVM) and MMP kernel. We compared important features highlighted by this SVM-based interpretation approach and the SHapley Additive exPlanations (SHAP) as a major model-independent approach. The model-specific approach could capture the difference between AC and non-AC, while SHAP assigned high weights to the features not present in the test instances. For specific MMPs, the feature weights mapped by the SVM-based interpretation method were in agreement with the previously confirmed binding knowledge from X-ray co-crystal structures, indicating that this method is able to interpret the AC prediction model in a chemically intuitive manner.

APA, Harvard, Vancouver, ISO, and other styles

30

Kang, Eun Ae, Jongha Jang, Chang Hwan Choi, Sang Bum Kang, Ki Bae Bang, Tae Oh Kim, Geom Seog Seo, et al. "Development of a Clinical and Genetic Prediction Model for Early Intestinal Resection in Patients with Crohn’s Disease: Results from the IMPACT Study." Journal of Clinical Medicine 10, no. 4 (February 7, 2021): 633. http://dx.doi.org/10.3390/jcm10040633.

Full text

Abstract:

Early intestinal resection in patients with Crohn’s disease (CD) is necessary due to a severe and complicating disease course. Herein, we aim to predict which patients with CD need early intestinal resection within 3 years of diagnosis, according to a tree-based machine learning technique. The single-nucleotide polymorphism (SNP) genotype data for 337 CD patients recruited from 15 hospitals were typed using the Korea Biobank Array. For external validation, an additional 126 CD patients were genotyped. The predictive model was trained using the 102 candidate SNPs and seven sets of clinical information (age, sex, cigarette smoking, disease location, disease behavior, upper gastrointestinal involvement, and perianal disease) by employing a tree-based machine learning method (CatBoost). The importance of each feature was measured using the Shapley Additive Explanations (SHAP) model. The final model comprised two clinical parameters (age and disease behavior) and four SNPs (rs28785174, rs60532570, rs13056955, and rs7660164). The combined clinical–genetic model predicted early surgery more accurately than a clinical-only model in both internal (area under the receiver operating characteristic (AUROC), 0.878 vs. 0.782; n = 51; p < 0.001) and external validation (AUROC, 0.836 vs. 0.805; n = 126; p < 0.001). Identification of genetic polymorphisms and clinical features enhanced the prediction of early intestinal resection in patients with CD.

APA, Harvard, Vancouver, ISO, and other styles

31

Abdollahi, Abolfazl, and Biswajeet Pradhan. "Urban Vegetation Mapping from Aerial Imagery Using Explainable AI (XAI)." Sensors 21, no. 14 (July 11, 2021): 4738. http://dx.doi.org/10.3390/s21144738.

Full text

Abstract:

Urban vegetation mapping is critical in many applications, i.e., preserving biodiversity, maintaining ecological balance, and minimizing the urban heat island effect. It is still challenging to extract accurate vegetation covers from aerial imagery using traditional classification approaches, because urban vegetation categories have complex spatial structures and similar spectral properties. Deep neural networks (DNNs) have shown a significant improvement in remote sensing image classification outcomes during the last few years. These methods are promising in this domain, yet unreliable for various reasons, such as the use of irrelevant descriptor features in the building of the models and lack of quality in the labeled image. Explainable AI (XAI) can help us gain insight into these limits and, as a result, adjust the training dataset and model as needed. Thus, in this work, we explain how an explanation model called Shapley additive explanations (SHAP) can be utilized for interpreting the output of the DNN model that is designed for classifying vegetation covers. We want to not only produce high-quality vegetation maps, but also rank the input parameters and select appropriate features for classification. Therefore, we test our method on vegetation mapping from aerial imagery based on spectral and textural features. Texture features can help overcome the limitations of poor spectral resolution in aerial imagery for vegetation mapping. The model was capable of obtaining an overall accuracy (OA) of 94.44% for vegetation cover mapping. The conclusions derived from SHAP plots demonstrate the high contribution of features, such as Hue, Brightness, GLCM_Dissimilarity, GLCM_Homogeneity, and GLCM_Mean to the output of the proposed model for vegetation mapping. Therefore, the study indicates that existing vegetation mapping strategies based only on spectral characteristics are insufficient to appropriately classify vegetation covers.

APA, Harvard, Vancouver, ISO, and other styles

32

Fiok, Krzysztof, Waldemar Karwowski, Edgar Gutierrez, and Tareq Ahram. "Predicting the Volume of Response to Tweets Posted by a Single Twitter Account." Symmetry 12, no. 6 (June 25, 2020): 1054. http://dx.doi.org/10.3390/sym12061054.

Full text

Abstract:

Social media users, including organizations, often struggle to acquire the maximum number of responses from other users, but predicting the responses that a post will receive before publication is highly desirable. Previous studies have analyzed why a given tweet may become more popular than others, and have used a variety of models trained to predict the response that a given tweet will receive. The present research addresses the prediction of response measures available on Twitter, including likes, replies and retweets. Data from a single publisher, the official US Navy Twitter account, were used to develop a feature-based model derived from structured tweet-related data. Most importantly, a deep learning feature extraction approach for analyzing unstructured tweet text was applied. A classification task with three classes, representing low, moderate and high responses to tweets, was defined and addressed using four machine learning classifiers. All proposed models were symmetrically trained in a fivefold cross-validation regime using various feature configurations, which allowed for the methodically sound comparison of prediction approaches. The best models achieved F1 scores of 0.655. Our study also used SHapley Additive exPlanations (SHAP) to demonstrate limitations in the research on explainable AI methods involving Deep Learning Language Modeling in NLP. We conclude that model performance can be significantly improved by leveraging additional information from the images and links included in tweets.

APA, Harvard, Vancouver, ISO, and other styles

33

Kim, Jeong-Kyun, Myung-Nam Bae, Kang Bok Lee, and Sang Gi Hong. "Identification of Patients with Sarcopenia Using Gait Parameters Based on Inertial Sensors." Sensors 21, no. 5 (March 4, 2021): 1786. http://dx.doi.org/10.3390/s21051786.

Full text

Abstract:

Sarcopenia can cause various senile diseases and is a major factor associated with the quality of life in old age. To diagnose, assess, and monitor muscle loss in daily life, 10 sarcopenia and 10 normal subjects were selected using lean mass index and grip strength, and their gait signals obtained from inertial sensor-based gait devices were analyzed. Given that the inertial sensor can measure the acceleration and angular velocity, it is highly useful in the kinematic analysis of walking. This study detected spatial-temporal parameters used in clinical practice and descriptive statistical parameters for all seven gait phases for detailed analyses. To increase the accuracy of sarcopenia identification, we used Shapley Additive explanations to select important parameters that facilitated high classification accuracy. Support vector machines (SVM), random forest, and multilayer perceptron are classification methods that require traditional feature extraction, whereas deep learning methods use raw data as input to identify sarcopenia. As a result, the input that used the descriptive statistical parameters for the seven gait phases obtained higher accuracy. The knowledge-based gait parameter detection was more accurate in identifying sarcopenia than automatic feature selection using deep learning. The highest accuracy of 95% was achieved using an SVM model with 20 descriptive statistical parameters. Our results indicate that sarcopenia can be monitored with a wearable device in daily life.

APA, Harvard, Vancouver, ISO, and other styles

34

Kuchin, Yan, Ravil Mukhamediev, Kirill Yakunin, Janis Grundspenkis, and Adilkhan Symagulov. "Assessing the Impact of Expert Labelling of Training Data on the Quality of Automatic Classification of Lithological Groups Using Artificial Neural Networks." Applied Computer Systems 25, no. 2 (December 1, 2020): 145–52. http://dx.doi.org/10.2478/acss-2020-0016.

Full text

Abstract:

Abstract Machine learning (ML) methods are nowadays widely used to automate geophysical study. Some of ML algorithms are used to solve lithological classification problems during uranium mining process. One of the key aspects of using classical ML methods is causing data features and estimating their influence on the classification. This paper presents a quantitative assessment of the impact of expert opinions on the classification process. In other words, we have prepared the data, identified the experts and performed a series of experiments with and without taking into account the fact that the expert identifier is supplied to the input of the automatic classifier during training and testing. Feedforward artificial neural network (ANN) has been used as a classifier. The results of the experiments show that the “knowledge” of the ANN of which expert interpreted the data improves the quality of the automatic classification in terms of accuracy (by 5 %) and recall (by 20 %). However, due to the fact that the input parameters of the model may depend on each other, the SHapley Additive exPlanations (SHAP) method has been used to further assess the impact of expert identifier. SHAP has allowed assessing the degree of parameter influence. It has revealed that the expert ID is at least two times more influential than any of the other input parameters of the neural network. This circumstance imposes significant restrictions on the application of ANNs to solve the task of lithological classification at the uranium deposits.

APA, Harvard, Vancouver, ISO, and other styles

35

Zheng, Bowen, Yong Cai, Fengxia Zeng, Min Lin, Jun Zheng, Weiguo Chen, Genggeng Qin, and Yi Guo. "An Interpretable Model-Based Prediction of Severity and Crucial Factors in Patients with COVID-19." BioMed Research International 2021 (March 1, 2021): 1–9. http://dx.doi.org/10.1155/2021/8840835.

Full text

Abstract:

This study established an interpretable machine learning model to predict the severity of coronavirus disease 2019 (COVID-19) and output the most crucial deterioration factors. Clinical information, laboratory tests, and chest computed tomography (CT) scans at admission were collected. Two experienced radiologists reviewed the scans for the patterns, distribution, and CT scores of lung abnormalities. Six machine learning models were established to predict the severity of COVID-19. After parameter tuning and performance comparison, the optimal model was explained using Shapley Additive explanations to output the crucial factors. This study enrolled and classified 198 patients into mild ( n = 162 ; 46.93 ± 14.49 years old) and severe ( n = 36 ; 60.97 ± 15.91 years old) groups. The severe group had a higher temperature ( 37.42 ± 0.99 °C vs. 36.75 ± 0.66 °C), CT score at admission, neutrophil count, and neutrophil-to-lymphocyte ratio than the mild group. The XGBoost model ranked first among all models, with an AUC, sensitivity, and specificity of 0.924, 90.91%, and 97.96%, respectively. The early stage of chest CT, total CT score of the percentage of lung involvement, and age were the top three contributors to the prediction of the deterioration of XGBoost. A higher total score on chest CT had a more significant impact on the prediction. In conclusion, the XGBoost model to predict the severity of COVID-19 achieved excellent performance and output the essential factors in the deterioration process, which may help with early clinical intervention, improve prognosis, and reduce mortality.

APA, Harvard, Vancouver, ISO, and other styles

36

Yang, He, Emma Li, Yi Fang Cai, Jiapei Li, and George X. Yuan. "The extraction of early warning features for predicting financial distress based on XGBoost model and shap framework." International Journal of Financial Engineering 08, no. 03 (June 23, 2021): 2141004. http://dx.doi.org/10.1142/s2424786321410048.

Full text

Abstract:

The purpose of this paper is to establish a framework for the extraction of early warning risk features for the predicting financial distress based on XGBoost model and SHAP. It is well known that the way to construct early warning risk features to predict financial distress of companies is very important, and by comparing with the traditional statistical methods, though the data-driven machine learning for the financial early warning, modelling has a better performance in terms of prediction accuracy, but it also brings the difficulty such as the one the corresponding model may be not explained well. Recently, eXtreme Gradient Boosting (XGBoost), an ensemble learning algorithm based on extreme gradient boosting, has become a hot topic in the area of machine learning research field due to its strong nonlinear information recognition ability and high prediction accuracy in the practice. In this study, the XGBoost algorithm is used to extract early warning features for the predicting financial distress for listed companies, with 76 financial risk features from seven categories of aspects, and 14 non-financial risk features from four categories of aspects, which are collected to establish an early warning system for the predication of financial distress. With applications, we conduct the empirical testing respect to AUC, KS and Kappa, the numerical results show that by comparing with the Logistic model, our method based on XGBoost model established in this paper has much better ability to predict the financial distress risk of listed companies. Moreover, under the framework of SHAP (SHAPley Additive exPlanations), we are able to give a reasonable explanation for important risk features and influencing ways affecting the financial distress visibly. The results given by this paper show that the XGBoost approach to model early warning features for financial distress does not only preform a better prediction accuracy, but also is explainable, which is significant for the identification of early warning to the financial distress risk for listed companies in the practice.

APA, Harvard, Vancouver, ISO, and other styles

37

Kang, Yoojin, Eunna Jang, Jungho Im, Chungeun Kwon, and Sungyong Kim. "Developing a New Hourly Forest Fire Risk Index Based on Catboost in South Korea." Applied Sciences 10, no. 22 (November 19, 2020): 8213. http://dx.doi.org/10.3390/app10228213.

Full text

Abstract:

Forest fires can cause enormous damage, such as deforestation and environmental pollution, even with a single occurrence. It takes a lot of effort and long time to restore areas damaged by wildfires. Therefore, it is crucial to know the forest fire risk of a region to appropriately prepare and respond to such disastrous events. The purpose of this study is to develop an hourly forest fire risk index (HFRI) with 1 km spatial resolution using accessibility, fuel, time, and weather factors based on Catboost machine learning over South Korea. HFRI was calculated through an ensemble model that combined an integrated model using all factors and a meteorological model using weather factors only. To confirm the generalized performance of the proposed model, all forest fires that occurred from 2014 to 2019 were validated using the receiver operating characteristic (ROC) curves and the area under the ROC curve (AUC) values through one-year-out cross-validation. The AUC value of HFRI ensemble model was 0.8434, higher than the meteorological model. HFRI was compared with the modified version of Fine Fuel Moisture Code (FFMC) used in the Canadian Forest Fire Danger Rating Systems and Daily Weather Index (DWI), South Korea’s current forest fire risk index. When compared to DWI and the revised FFMC, HFRI enabled a more spatially detailed and seasonally stable forest fire risk simulation. In addition, the feature contribution to the forest fire risk prediction was analyzed through the Shapley Additive exPlanations (SHAP) value of Catboost. The contributing variables were in the order of relative humidity, elevation, road density, and population density. It was confirmed that the accessibility factors played very important roles in forest fire risk modeling where most forest fires were caused by anthropogenic factors. The interaction between the variables was also examined.

APA, Harvard, Vancouver, ISO, and other styles

38

Tahfim, Syed As-Sadeq, and Chen Yan. "Analysis of Severe Injuries in Crashes Involving Large Trucks Using K-Prototypes Clustering-Based GBDT Model." Safety 7, no. 2 (April 29, 2021): 32. http://dx.doi.org/10.3390/safety7020032.

Full text

Abstract:

The unobserved heterogeneity in traffic crash data hides certain relationships between the contributory factors and injury severity. The literature has been limited in exploring different types of clustering methods for the analysis of the injury severity in crashes involving large trucks. Additionally, the variability of data type in traffic crash data has rarely been addressed. This study explored the application of the k-prototypes clustering method to countermeasure the unobserved heterogeneity in large truck-involved crashes that had occurred in the United States between the period of 2016 to 2019. The study segmented the entire dataset (EDS) into three homogeneous clusters. Four gradient boosted decision trees (GBDT) models were developed on the EDS and individual clusters to predict the injury severity in crashes involving large trucks. The list of input features included crash characteristics, truck characteristics, roadway attributes, time and location of the crash, and environmental factors. Each cluster-based GBDT model was compared with the EDS-based model. Two of the three cluster-based models showed significant improvement in their predicting performances. Additionally, feature analysis using the SHAP (Shapley additive explanations) method identified few new important features in each cluster and showed that some features have a different degree of effects on severe injuries in the individual clusters. The current study concluded that the k-prototypes clustering-based GBDT model is a promising approach to reveal hidden insights, which can be used to improve safety measures, roadway conditions and policies for the prevention of severe injuries in crashes involving large trucks.

APA, Harvard, Vancouver, ISO, and other styles

39

Pan, Pan, Yichao Li, Yongjiu Xiao, Bingchao Han, Longxiang Su, Mingliang Su, Yansheng Li, et al. "Prognostic Assessment of COVID-19 in the Intensive Care Unit by Machine Learning Methods: Model Development and Validation." Journal of Medical Internet Research 22, no. 11 (November 11, 2020): e23128. http://dx.doi.org/10.2196/23128.

Full text

Abstract:

Background Patients with COVID-19 in the intensive care unit (ICU) have a high mortality rate, and methods to assess patients’ prognosis early and administer precise treatment are of great significance. Objective The aim of this study was to use machine learning to construct a model for the analysis of risk factors and prediction of mortality among ICU patients with COVID-19. Methods In this study, 123 patients with COVID-19 in the ICU of Vulcan Hill Hospital were retrospectively selected from the database, and the data were randomly divided into a training data set (n=98) and test data set (n=25) with a 4:1 ratio. Significance tests, correlation analysis, and factor analysis were used to screen 100 potential risk factors individually. Conventional logistic regression methods and four machine learning algorithms were used to construct the risk prediction model for the prognosis of patients with COVID-19 in the ICU. The performance of these machine learning models was measured by the area under the receiver operating characteristic curve (AUC). Interpretation and evaluation of the risk prediction model were performed using calibration curves, SHapley Additive exPlanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), etc, to ensure its stability and reliability. The outcome was based on the ICU deaths recorded from the database. Results Layer-by-layer screening of 100 potential risk factors finally revealed 8 important risk factors that were included in the risk prediction model: lymphocyte percentage, prothrombin time, lactate dehydrogenase, total bilirubin, eosinophil percentage, creatinine, neutrophil percentage, and albumin level. Finally, an eXtreme Gradient Boosting (XGBoost) model established with the 8 important risk factors showed the best recognition ability in the training set of 5-fold cross validation (AUC=0.86) and the verification queue (AUC=0.92). The calibration curve showed that the risk predicted by the model was in good agreement with the actual risk. In addition, using the SHAP and LIME algorithms, feature interpretation and sample prediction interpretation algorithms of the XGBoost black box model were implemented. Additionally, the model was translated into a web-based risk calculator that is freely available for public usage. Conclusions The 8-factor XGBoost model predicts risk of death in ICU patients with COVID-19 well; it initially demonstrates stability and can be used effectively to predict COVID-19 prognosis in ICU patients.

APA, Harvard, Vancouver, ISO, and other styles

40

Ferrari, Davide, Jovana Milic, Roberto Tonelli, Francesco Ghinelli, Marianna Meschiari, Sara Volpi, Matteo Faltoni, et al. "Machine learning in predicting respiratory failure in patients with COVID-19 pneumonia—Challenges, strengths, and opportunities in a global health emergency." PLOS ONE 15, no. 11 (November 12, 2020): e0239172. http://dx.doi.org/10.1371/journal.pone.0239172.

Full text

Abstract:

Aims The aim of this study was to estimate a 48 hour prediction of moderate to severe respiratory failure, requiring mechanical ventilation, in hospitalized patients with COVID-19 pneumonia. Methods This was an observational prospective study that comprised consecutive patients with COVID-19 pneumonia admitted to hospital from 21 February to 6 April 2020. The patients’ medical history, demographic, epidemiologic and clinical data were collected in an electronic patient chart. The dataset was used to train predictive models using an established machine learning framework leveraging a hybrid approach where clinical expertise is applied alongside a data-driven analysis. The study outcome was the onset of moderate to severe respiratory failure defined as PaO2/FiO2 ratio <150 mmHg in at least one of two consecutive arterial blood gas analyses in the following 48 hours. Shapley Additive exPlanations values were used to quantify the positive or negative impact of each variable included in each model on the predicted outcome. Results A total of 198 patients contributed to generate 1068 usable observations which allowed to build 3 predictive models based respectively on 31-variables signs and symptoms, 39-variables laboratory biomarkers and 91-variables as a composition of the two. A fourth “boosted mixed model” included 20 variables was selected from the model 3, achieved the best predictive performance (AUC = 0.84) without worsening the FN rate. Its clinical performance was applied in a narrative case report as an example. Conclusion This study developed a machine model with 84% prediction accuracy, which is able to assist clinicians in decision making process and contribute to develop new analytics to improve care at high technology readiness levels.

APA, Harvard, Vancouver, ISO, and other styles

41

Pai, Kai-Chih, Min-Shian Wang, Yun-Feng Chen, Chien-Hao Tseng, Po-Yu Liu, Lun-Chi Chen, Ruey-Kai Sheu, and Chieh-Liang Wu. "An Artificial Intelligence Approach to Bloodstream Infections Prediction." Journal of Clinical Medicine 10, no. 13 (June 29, 2021): 2901. http://dx.doi.org/10.3390/jcm10132901.

Full text

Abstract:

This study aimed to develop an early prediction model for identifying patients with bloodstream infections. The data resource was taken from 2015 to 2019 at Taichung Veterans General Hospital, and a total of 1647 bloodstream infection episodes and 3552 non-bloodstream infection episodes in the intensive care unit (ICU) were included in the model development and evaluation. During the data analysis, 30 clinical variables were selected, including patients’ basic characteristics, vital signs, laboratory data, and clinical information. Five machine learning algorithms were applied to examine the prediction model performance. The findings indicated that the area under the receiver operating characteristic curve (AUROC) of the prediction performance of the XGBoost model was 0.825 for the validation dataset and 0.821 for the testing dataset. The random forest model also presented higher values for the AUROC on the validation dataset and testing dataset, which were 0.855 and 0.851, respectively. The tree-based ensemble learning model enabled high detection ability for patients with bloodstream infections in the ICU. Additionally, the analysis of importance of features revealed that alkaline phosphatase (ALKP) and the period of the central venous catheter are the most important predictors for bloodstream infections. We further explored the relationship between features and the risk of bloodstream infection by using the Shapley Additive exPlanations (SHAP) visualized method. The results showed that a higher prothrombin time is more prominent in a bloodstream infection. Additionally, the impact of a lower platelet count and albumin was more prominent in a bloodstream infection. Our results provide additional clinical information for cut-off laboratory values to assist clinical decision-making in bloodstream infection diagnostics.

APA, Harvard, Vancouver, ISO, and other styles

42

Park, Kwang Ho, Erdenebileg Batbaatar, Yongjun Piao, Nipon Theera-Umpon, and Keun Ho Ryu. "Deep Learning Feature Extraction Approach for Hematopoietic Cancer Subtype Classification." International Journal of Environmental Research and Public Health 18, no. 4 (February 23, 2021): 2197. http://dx.doi.org/10.3390/ijerph18042197.

Full text

Abstract:

Hematopoietic cancer is a malignant transformation in immune system cells. Hematopoietic cancer is characterized by the cells that are expressed, so it is usually difficult to distinguish its heterogeneities in the hematopoiesis process. Traditional approaches for cancer subtyping use statistical techniques. Furthermore, due to the overfitting problem of small samples, in case of a minor cancer, it does not have enough sample material for building a classification model. Therefore, we propose not only to build a classification model for five major subtypes using two kinds of losses, namely reconstruction loss and classification loss, but also to extract suitable features using a deep autoencoder. Furthermore, for considering the data imbalance problem, we apply an oversampling algorithm, the synthetic minority oversampling technique (SMOTE). For validation of our proposed autoencoder-based feature extraction approach for hematopoietic cancer subtype classification, we compared other traditional feature selection algorithms (principal component analysis, non-negative matrix factorization) and classification algorithms with the SMOTE oversampling approach. Additionally, we used the Shapley Additive exPlanations (SHAP) interpretation technique in our model to explain the important gene/protein for hematopoietic cancer subtype classification. Furthermore, we compared five widely used classification algorithms, including logistic regression, random forest, k-nearest neighbor, artificial neural network and support vector machine. The results of autoencoder-based feature extraction approaches showed good performance, and the best result was the SMOTE oversampling-applied support vector machine algorithm consider both focal loss and reconstruction loss as the loss function for autoencoder (AE) feature selection approach, which produced 97.01% accuracy, 92.60% recall, 99.52% specificity, 93.54% F1-measure, 97.87% G-mean and 95.46% index of balanced accuracy as subtype classification performance measures.

APA, Harvard, Vancouver, ISO, and other styles

43

Wang, Fang, Chun-Shuang Xu, Wei-Hua Chen, Shi-Wei Duan, Shu-Jun Xu, Jun-Jie Dai, and Qin-Wen Wang. "Identification of Blood-Based Glycolysis Gene Associated with Alzheimer’s Disease by Integrated Bioinformatics Analysis." Journal of Alzheimer's Disease 83, no. 1 (August 31, 2021): 163–78. http://dx.doi.org/10.3233/jad-210540.

Full text

Abstract:

Background: Alzheimer’s disease (AD) is one of many common neurodegenerative diseases without ideal treatment, but early detection and intervention can prevent the disease progression. Objective: This study aimed to identify AD-related glycolysis gene for AD diagnosis and further investigation by integrated bioinformatics analysis. Methods: 122 subjects were recruited from the affiliated hospitals of Ningbo University between 1 October 2015 and 31 December 2016. Their clinical information and methylation levels of 8 glycolysis genes were assessed. Machine learning algorithms were used to establish an AD prediction model. Receiver operating characteristic curve (AUC) and decision curve analysis (DCA) were used to assess the model. An AD risk factor model was developed by SHapley Additive exPlanations (SHAP) to extract features that had important impacts on AD. Finally, gene expression of AD-related glycolysis genes were validated by AlzData. Results: An AD prediction model was developed using random forest algorithm with the best average ROC_AUC (0.969544). The threshold probability of the model was positive in the range of 0∼0.9875 by DCA. Eight glycolysis genes (GAPDHS, PKLR, PFKFB3, LDHC, DLD, ALDOC, LDHB, HK3) were identified by SHAP. Five of these genes (PFKFB3, DLD, ALDOC, LDHB, LDHC) have significant differences in gene expression between AD and control groups by Alzdata, while three of the genes (HK3, ALDOC, PKLR) are related to the pathogenesis of AD. GAPDHS is involved in the regulatory network of AD risk genes. Conclusion: We identified 8 AD-related glycolysis genes (GAPDHS, PFKFB3, LDHC, HK3, ALDOC, LDHB, PKLR, DLD) as promising candidate biomarkers for early diagnosis of AD by integrated bioinformatics analysis. Machine learning has the advantage in identifying genes.

APA, Harvard, Vancouver, ISO, and other styles

44

Martínez-Florez, Juan F., Juan D. Osorio, Judith C. Cediel, Juan C. Rivas, Ana M. Granados-Sánchez, Jéssica López-Peláez, Tania Jaramillo, and Juan F. Cardona. "Short-Term Memory Binding Distinguishing Amnestic Mild Cognitive Impairment from Healthy Aging: A Machine Learning Study." Journal of Alzheimer's Disease 81, no. 2 (May 18, 2021): 729–42. http://dx.doi.org/10.3233/jad-201447.

Full text

Abstract:

Background: Amnestic mild cognitive impairment (aMCI) is the most common preclinical stage of Alzheimer’s disease (AD). A strategy to reduce the impact of AD is the early aMCI diagnosis and clinical intervention. Neuroimaging, neurobiological, and genetic markers have proved to be sensitive and specific for the early diagnosis of AD. However, the high cost of these procedures is prohibitive in low-income and middle-income countries (LIMCs). The neuropsychological assessments currently aim to identify cognitive markers that could contribute to the early diagnosis of dementia. Objective: Compare machine learning (ML) architectures classifying and predicting aMCI and asset the contribution of cognitive measures including binding function in distinction and prediction of aMCI. Methods: We conducted a two-year follow-up assessment of a sample of 154 subjects with a comprehensive multidomain neuropsychological battery. Statistical analysis was proposed using complete ML architectures to compare subjects’ performance to classify and predict aMCI. Additionally, permutation importance and Shapley additive explanations (SHAP) routines were implemented for feature importance selection. Results: AdaBoost, gradient boosting, and XGBoost had the highest performance with over 80%success classifying aMCI, and decision tree and random forest had the highest performance with over 70%success predictive routines. Feature importance points, the auditory verbal learning test, short-term memory binding tasks, and verbal and category fluency tasks were used as variables with the first grade of importance to distinguish healthy cognition and aMCI. Conclusion: Although neuropsychological measures do not replace biomarkers’ utility, it is a relatively sensitive and specific diagnostic tool for aMCI. Further studies with ML must identify cognitive performance that differentiates conversion from average MCI to the pathological MCI observed in AD.

APA, Harvard, Vancouver, ISO, and other styles

45

Padarian, José, Alex B. McBratney, and Budiman Minasny. "Game theory interpretation of digital soil mapping convolutional neural networks." SOIL 6, no. 2 (August 18, 2020): 389–97. http://dx.doi.org/10.5194/soil-6-389-2020.

Full text

Abstract:

Abstract. The use of complex models such as deep neural networks has yielded large improvements in predictive tasks in many fields including digital soil mapping. One of the concerns about using these models is that they are perceived as black boxes with low interpretability. In this paper we introduce the use of game theory, specifically Shapley additive explanations (SHAP) values, in order to interpret a digital soil mapping model. SHAP values represent the contribution of a covariate to the final model predictions. We applied this method to a multi-task convolutional neural network trained to predict soil organic carbon in Chile. The results show the contribution of each covariate to the model predictions in three different contexts: (a) at a local level, showing the contribution of the various covariates for a single prediction; (b) a global understanding of the covariate contribution; and (c) a spatial interpretation of their contributions. The latter constitutes a novel application of SHAP values and also the first detailed analysis of a model in a spatial context. The analysis of a SOC (soil organic carbon) model in Chile corroborated that the model is capturing sensible relationships between SOC and rainfall, temperature, elevation, slope, and topographic wetness index. The results agree with commonly reported relationships, highlighting environmental thresholds that coincide with significant areas within the study area. This contribution addresses the limitations of the current interpretation of models in digital soil mapping, especially in a spatial context. We believe that SHAP values are a valuable tool that should be included within the DSM (digital soil mapping) framework, since they address the important concerns regarding the interpretability of more complex models. The model interpretation is a crucial step that could lead to generating new knowledge to improve our understanding of soils.

APA, Harvard, Vancouver, ISO, and other styles

46

Hu, Chien-An, Chia-Ming Chen, Yen-Chun Fang, Shinn-Jye Liang, Hao-Chien Wang, Wen-Feng Fang, Chau-Chyun Sheu, et al. "Using a machine learning approach to predict mortality in critically ill influenza patients: a cross-sectional retrospective multicentre study in Taiwan." BMJ Open 10, no. 2 (February 2020): e033898. http://dx.doi.org/10.1136/bmjopen-2019-033898.

Full text

Abstract:

ObjectivesCurrent mortality prediction models used in the intensive care unit (ICU) have a limited role for specific diseases such as influenza, and we aimed to establish an explainable machine learning (ML) model for predicting mortality in critically ill influenza patients using a real-world severe influenza data set.Study designA cross-sectional retrospective multicentre study in TaiwanSettingEight medical centres in Taiwan.ParticipantsA total of 336 patients requiring ICU-admission for virology-proven influenza at eight hospitals during an influenza epidemic between October 2015 and March 2016.Primary and secondary outcome measuresWe employed extreme gradient boosting (XGBoost) to establish the prediction model, compared the performance with logistic regression (LR) and random forest (RF), demonstrated the feature importance categorised by clinical domains, and used SHapley Additive exPlanations (SHAP) for visualised interpretation.ResultsThe data set contained 76 features of the 336 patients with severe influenza. The severity was apparently high, as shown by the high Acute Physiology and Chronic Health Evaluation II score (22, 17 to 29) and pneumonia severity index score (118, 88 to 151). XGBoost model (area under the curve (AUC): 0.842; 95% CI 0.749 to 0.928) outperformed RF (AUC: 0.809; 95% CI 0.629 to 0.891) and LR (AUC: 0.701; 95% CI 0.573 to 0.825) for predicting 30-day mortality. To give clinicians an intuitive understanding of feature exploitation, we stratified features by the clinical domain. The cumulative feature importance in the fluid balance domain, ventilation domain, laboratory data domain, demographic and symptom domain, management domain and severity score domain was 0.253, 0.113, 0.177, 0.140, 0.152 and 0.165, respectively. We further used SHAP plots to illustrate associations between features and 30-day mortality in critically ill influenza patients.ConclusionsWe used a real-world data set and applied an ML approach, mainly XGBoost, to establish a practical and explainable mortality prediction model in critically ill influenza patients.

APA, Harvard, Vancouver, ISO, and other styles

47

Pan, Derun, Renyi Liu, Bowen Zheng, Jianxiang Yuan, Hui Zeng, Zilong He, Zhendong Luo, Genggeng Qin, and Weiguo Chen. "Using Machine Learning to Unravel the Value of Radiographic Features for the Classification of Bone Tumors." BioMed Research International 2021 (March 11, 2021): 1–10. http://dx.doi.org/10.1155/2021/8811056.

Full text

Abstract:

Objectives. To build and validate random forest (RF) models for the classification of bone tumors based on the conventional radiographic features of the lesion and patients’ clinical characteristics, and identify the most essential features for the classification of bone tumors. Materials and Methods. In this retrospective study, 796 patients (benign bone tumors: 412 cases, malignant bone tumors: 215 cases, intermediate bone tumors: 169 cases) with pathologically confirmed bone tumors from Nanfang Hospital of Southern Medical University, Foshan Hospital of TCM, and University of Hong Kong-Shenzhen Hospital were enrolled. RF models were built to classify tumors as benign, malignant, or intermediate based on conventional radiographic features and potentially relevant clinical characteristics extracted by three musculoskeletal radiologists with ten years of experience. SHapley Additive exPlanations (SHAP) was used to identify the most essential features for the classification of bone tumors. The diagnostic performance of the RF models was quantified using receiver operating characteristic (ROC) curves. Results. The features extracted by the three radiologists had a satisfactory agreement and the minimum intraclass correlation coefficient (ICC) was 0.761 (CI: 0.686-0.824, P < .001 ). The binary and tertiary models were built to classify tumors as benign, malignant, or intermediate based on the imaging and clinical features from 627 and 796 patients. The AUC of the binary (19 variables) and tertiary (22 variables) models were 0.97 and 0.94, respectively. The accuracy of binary and tertiary models were 94.71% and 82.77%, respectively. In descending order, the most important features influencing classification in the binary model were margin, cortex involvement, and the pattern of bone destruction, and the most important features in the tertiary model were margin, high-density components, and cortex involvement. Conclusions. This study developed interpretable models to classify bone tumors with great performance. These should allow radiographers to identify imaging features that are important for the classification of bone tumors in the clinical setting.

APA, Harvard, Vancouver, ISO, and other styles

48

Kokkotis, Christos, Serafeim Moustakidis, Vasilios Baltzopoulos, Giannis Giakas, and Dimitrios Tsaopoulos. "Identifying Robust Risk Factors for Knee Osteoarthritis Progression: An Evolutionary Machine Learning Approach." Healthcare 9, no. 3 (March 1, 2021): 260. http://dx.doi.org/10.3390/healthcare9030260.

Full text

Abstract:

Knee osteoarthritis (KOA) is a multifactorial disease which is responsible for more than 80% of the osteoarthritis disease’s total burden. KOA is heterogeneous in terms of rates of progression with several different phenotypes and a large number of risk factors, which often interact with each other. A number of modifiable and non-modifiable systemic and mechanical parameters along with comorbidities as well as pain-related factors contribute to the development of KOA. Although models exist to predict the onset of the disease or discriminate between asymptotic and OA patients, there are just a few studies in the recent literature that focused on the identification of risk factors associated with KOA progression. This paper contributes to the identification of risk factors for KOA progression via a robust feature selection (FS) methodology that overcomes two crucial challenges: (i) the observed high dimensionality and heterogeneity of the available data that are obtained from the Osteoarthritis Initiative (OAI) database and (ii) a severe class imbalance problem posed by the fact that the KOA progressors class is significantly smaller than the non-progressors’ class. The proposed feature selection methodology relies on a combination of evolutionary algorithms and machine learning (ML) models, leading to the selection of a relatively small feature subset of 35 risk factors that generalizes well on the whole dataset (mean accuracy of 71.25%). We investigated the effectiveness of the proposed approach in a comparative analysis with well-known FS techniques with respect to metrics related to both prediction accuracy and generalization capability. The impact of the selected risk factors on the prediction output was further investigated using SHapley Additive exPlanations (SHAP). The proposed FS methodology may contribute to the development of new, efficient risk stratification strategies and identification of risk phenotypes of each KOA patient to enable appropriate interventions.

APA, Harvard, Vancouver, ISO, and other styles

49

Quiroz, Juan Carlos, You-Zhen Feng, Zhong-Yuan Cheng, Dana Rezazadegan, Ping-Kang Chen, Qi-Ting Lin, Long Qian, et al. "Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study." JMIR Medical Informatics 9, no. 2 (February 11, 2021): e24572. http://dx.doi.org/10.2196/24572.

Full text

Abstract:

Background COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated. Objective This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data. Methods Clinical data—including demographics, signs, symptoms, comorbidities, and blood test results—and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework. Results Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929). Conclusions Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease.

APA, Harvard, Vancouver, ISO, and other styles

50

Lee, Hang-Lo, Jin-Seop Kim, Chang-Ho Hong, and Dong-Keun Cho. "Ensemble Learning Approach for the Prediction of Quantitative Rock Damage Using Various Acoustic Emission Parameters." Applied Sciences 11, no. 9 (April 28, 2021): 4008. http://dx.doi.org/10.3390/app11094008.

Full text

Abstract:

Monitoring rock damage subjected to cracks is an important stage in underground spaces such as radioactive waste disposal repository, civil tunnel, and mining industries. Acoustic emission (AE) technique is one of the methods for monitoring rock damage and has been used by many researchers. To increase the accuracy of the evaluation and prediction of rock damage, it is required to consider various AE parameters, but this work is a difficult problem due to the complexity of the relationship between several AE parameters and rock damage. The purpose of this study is to propose a machine learning (ML)-based prediction model of the quantitative rock damage taking into account of combined features between several AE parameters. To achieve the goal, 10 granite samples from KAERI (Korea Atomic Energy Research Institute) in Daejeon were prepared, and a uniaxial compression test was conducted. To construct a model, random forest (RF) was employed and compared with support vector regression (SVR). The result showed that the generalization performance of RF is higher than that of SVRRBF. The R2, RMSE, and MAPE of the RF for testing data are 0.989, 0.032, and 0.014, respectively, which are acceptable results for application in laboratory scale. As a complementary work, parameter analysis was conducted by means of the Shapley additive explanations (SHAP) for model interpretability. It was confirmed that the cumulative absolute energy and initiation frequency were selected as the main parameter in both high and low-level degrees of the damage. This study suggests the possibility of extension to in-situ application, as subsequent research. Additionally, it provides information that the RF algorithm is a suitable technique and which parameters should be considered for predicting the degree of damage. In future work, we will extend the research to the engineering scale and consider the attenuation characteristics of rocks for practical application.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Shapley Additive Explanations'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles