To see the other types of publications on this topic, follow the link: Gini impurity.

Journal articles on the topic 'Gini impurity'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 22 journal articles for your research on the topic 'Gini impurity.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Yuan, Ye, Liji Wu, and Xiangmin Zhang. "Gini-Impurity Index Analysis." IEEE Transactions on Information Forensics and Security 16 (2021): 3154–69. http://dx.doi.org/10.1109/tifs.2021.3076932.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Singh, Sudhir Kuamr, and Dr Vipin Saxena. "Reducing the Impurity of Object-Oriented DatabaseThrough Gini Index." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 13, no. 11 (November 30, 2014): 5172–78. http://dx.doi.org/10.24297/ijct.v13i11.2787.

Full text
Abstract:
In the current scenario, the size of database is increasing due to audio and video files. In the database, irregularities occur due to duplication of data at many places, therefore, it needs reconstruction of database size. The present work deals with reducing of impurity through a well-known Gini index technique. Since many of software’s are using the object-oriented databases, therefore, an object-oriented database is considered, A real object-oriented database for Electricity Bill Deposit System is considered. A sample size of 15 records is considered, however the present technique can be applied for large size or even for the complex database. A decision tree is constructed and sample queries are performed for verifying the result and Gini index is computed for minimizing the impurity in the presented object-oriented database. Â
APA, Harvard, Vancouver, ISO, and other styles
3

Singh, Vishwa Pratap, and R. L. Ujjwal. "Gini impurity based NDN cache pollution attack defence mechanism." Journal of Information and Optimization Sciences 41, no. 6 (August 17, 2020): 1353–63. http://dx.doi.org/10.1080/02522667.2020.1809092.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Jiang, Longquan, Bo Zhang, Qin Ni, Xuan Sun, and Pingping Dong. "Prediction of SNP Sequences via Gini Impurity Based Gradient Boosting Method." IEEE Access 7 (2019): 12647–57. http://dx.doi.org/10.1109/access.2019.2893269.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Zhi, Ting, Hongbin Luo, and Ying Liu. "A Gini Impurity-Based Interest Flooding Attack Defence Mechanism in NDN." IEEE Communications Letters 22, no. 3 (March 2018): 538–41. http://dx.doi.org/10.1109/lcomm.2018.2789896.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kwon, Taeyong, and Sanghoo Yoon. "Design of rain gauge network using entropy and Gini impurity: A case study of Gangwon Province." Journal of the Korean Data And Information Science Society 31, no. 4 (July 31, 2020): 569–77. http://dx.doi.org/10.7465/jkdi.2020.31.4.569.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Laber, Eduardo, and Lucas Murtinho. "Minimization of Gini Impurity: NP-completeness and Approximation Algorithm via Connections with the k-means Problem." Electronic Notes in Theoretical Computer Science 346 (August 2019): 567–76. http://dx.doi.org/10.1016/j.entcs.2019.08.050.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Grabmeier, Johannes L., and Larry A. Lambe. "Decision trees for binary classification variables grow equally with the Gini impurity measure and Pearson's chi-square test." International Journal of Business Intelligence and Data Mining 2, no. 2 (2007): 213. http://dx.doi.org/10.1504/ijbidm.2007.013938.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Jia, Naizheng, Dian Zu, and Baojun Jia. "Influence Analysis in Music: Artists and Genres." Journal of Innovation and Social Science Research 8, no. 7 (July 30, 2021): 199–203. http://dx.doi.org/10.53469/jissr.2021.08(07).36.

Full text
Abstract:
Music is an important part of human civilization, which is accompanied by the emergence and development of human civilization. While music is developing, different genres of music or musicians’ styles are all influencing each other. Therefore, it is significant to understand and measure the influence of music influencers and their followers in different genres of music with social development. Address to identify the influence network of American music, firstly, a PageRank model was proposed to calculate the music influence of every artist among the network. According to the results, we built the subnetworks to illustrate the influence of different sets. Then, we did research on distinguishing between genres, which utilized the random forest classifier. Furthermore, Gini impurity is used to demonstrate the feature's importance. Our result shows that Pop is the best influential genre, and acousticness is the most distinguishing feature.
APA, Harvard, Vancouver, ISO, and other styles
10

Qu, Hongquan, Zhanli Fan, Shuqin Cao, Liping Pang, Hao Wang, and Jie Zhang. "A Study on Sensitive Bands of EEG Data under Different Mental Workloads." Algorithms 12, no. 7 (July 22, 2019): 145. http://dx.doi.org/10.3390/a12070145.

Full text
Abstract:
Electroencephalogram (EEG) signals contain a lot of human body performance information. With the development of the brain–computer interface (BCI) technology, many researchers have used the feature extraction and classification algorithms in various fields to study the feature extraction and classification of EEG signals. In this paper, the sensitive bands of EEG data under different mental workloads are studied. By selecting the characteristics of EEG signals, the bands with the highest sensitivity to mental loads are selected. In this paper, EEG signals are measured in different load flight experiments. First, the EEG signals are preprocessed by independent component analysis (ICA) to remove the interference of electrooculogram (EOG) signals, and then the power spectral density and energy are calculated for feature extraction. Finally, the feature importance is selected based on Gini impurity. The classification accuracy of the support vector machines (SVM) classifier is verified by comparing the characteristics of the full band with the characteristics of the β band. The results show that the characteristics of the β band are the most sensitive in EEG data under different mental workloads.
APA, Harvard, Vancouver, ISO, and other styles
11

Wang, Chuyuan, Linxuan Zhang, and Chongdang Liu. "Adaptive scheduling method for dynamic robotic cell based on pattern classification algorithm." International Journal of Modeling, Simulation, and Scientific Computing 09, no. 05 (October 2018): 1850040. http://dx.doi.org/10.1142/s179396231850040x.

Full text
Abstract:
In order to deal with the dynamic production environment with frequent fluctuation of processing time, robotic cell needs an efficient scheduling strategy which meets the real-time requirements. This paper proposes an adaptive scheduling method based on pattern classification algorithm to guide the online scheduling process. The method obtains the scheduling knowledge of manufacturing system from the production data and establishes an adaptive scheduler, which can adjust the scheduling rules according to the current production status. In the process of establishing scheduler, how to choose essential attributes is the main difficulty. In order to solve the low performance and low efficiency problem of embedded feature selection method, based on the application of Extreme Gradient Boosting model (XGBoost) to obtain the adaptive scheduler, an improved hybrid optimization algorithm which integrates Gini impurity of XGBoost model into Particle Swarm Optimization (PSO) is employed to acquire the optimal subset of features. The results based on simulated robotic cell system show that the proposed PSO-XGBoost algorithm outperforms existing pattern classification algorithms and the newly learned adaptive model can improve the basic dispatching rules. At the same time, it can meet the demand of real-time scheduling.
APA, Harvard, Vancouver, ISO, and other styles
12

Jananto, Arief, Sulastri Sulastri, Eko Nur Wahyudi, and Sunardi Sunardi. "Data Induk Mahasiswa sebagai Prediktor Ketepatan Waktu Lulus Menggunakan Algoritma CART Klasifikasi Data Mining." Jurnal Sisfokom (Sistem Informasi dan Komputer) 10, no. 1 (February 22, 2021): 71–78. http://dx.doi.org/10.32736/sisfokom.v10i1.991.

Full text
Abstract:
Fakultas Teknologi Informasi Universitas Stikubank (UNISBANK) as one of the faculties in higher education in implementing learning activities has produced a lot of stored data and has graduated many students. The level of timeliness of graduation is important for study programs as an assessment of success. This research tries to dig up the pile of student parent data and graduation data in order to get the pass rate and graduation prediction of active students. By implementing the classification data mining technique and the CART algorithm, it is hoped that a decision tree can be used to predict the class timeliness of graduating from active students. By using the graduation data and student parent data totaling 1018 records, a decision tree model was obtained with an accuracy rate of 63% from the data testing test. Determination of split nodes using the Gini Index which breaks the dataset based on its impurity value. Tests conducted in this study show that the order of the variables in the decision tree is gender, origin school status, parental education, age at entry, city of birth, parent's occupation. The prediction with the resulting model is that 71% of active S1 Information Systems students can graduate on time and 51% for S1 Informatics Engineering students.
APA, Harvard, Vancouver, ISO, and other styles
13

Moualla, Soulaiman, Khaldoun Khorzom, and Assef Jafar. "Improving the Performance of Machine Learning-Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset." Computational Intelligence and Neuroscience 2021 (June 15, 2021): 1–13. http://dx.doi.org/10.1155/2021/5557577.

Full text
Abstract:
Networks are exposed to an increasing number of cyberattacks due to their vulnerabilities. So, cybersecurity strives to make networks as safe as possible, by introducing defense systems to detect any suspicious activities. However, firewalls and classical intrusion detection systems (IDSs) suffer from continuous updating of their defined databases to detect threats. The new directions of the IDSs aim to leverage the machine learning models to design more robust systems with higher detection rates and lower false alarm rates. This research presents a novel network IDS, which plays an important role in network security and faces the current cyberattacks on networks using the UNSW-NB15 dataset benchmark. Our proposed system is a dynamically scalable multiclass machine learning-based network IDS. It consists of several stages based on supervised machine learning. It starts with the Synthetic Minority Oversampling Technique (SMOTE) method to solve the imbalanced classes problem in the dataset and then selects the important features for each class existing in the dataset by the Gini Impurity criterion using the Extremely Randomized Trees Classifier (Extra Trees Classifier). After that, a pretrained extreme learning machine (ELM) model is responsible for detecting the attacks separately, “One-Versus-All” as a binary classifier for each of them. Finally, the ELM classifier outputs become the inputs to a fully connected layer in order to learn from all their combinations, followed by a logistic regression layer to make soft decisions for all classes. Results show that our proposed system performs better than related works in terms of accuracy, false alarm rate, Receiver Operating Characteristic (ROC), and Precision-Recall Curves (PRCs).
APA, Harvard, Vancouver, ISO, and other styles
14

Weidhaas, Joanne B., Donatello Telesca, Amar Upadhyaya Kishan, Ingrid Jenny Guldvik, Melanie-Birte Schulz-Jaavall, Andreas Stensvold, Phuoc T. Tran, and Wolfgang Lilleby. "MicroRNA-based biomarkers of the radiation response in prostate cancer." Journal of Clinical Oncology 38, no. 6_suppl (February 20, 2020): 163. http://dx.doi.org/10.1200/jco.2020.38.6_suppl.163.

Full text
Abstract:
163 Background: Intermediate and high-risk prostate cancer can be cured with radiation (RT) to the prostate and pelvic lymph nodes with androgen deprivation therapy (ADT), but both acute and late toxicity of the GU and GI systems are common. There are no biomarkers predicting radiation outcomes, limiting the opportunity to best personalize prostate radiation therapy. Methods: A prospectively enrolled single arm trial for locally advanced prostate cancer patients (T1-T4N0-N1M0) treated with definitive RT (74Gy IMRT) plus ADT was studied. Biologic samples were available in 108 of 138 patients. Toxicity was recorded using the RTOG morbidity grading system. We applied a panel of microRNA-based germline mutations shown to predict cancer therapy endpoints. Machine learning techniques were used to simultaneously identify prognostic features and perform classification of the biomarkers. Upsampling nested LOO-CV was used to assess performance and generality. Independent Fisher’s exact tests were performed to identify statistically significant marginal associations. Three classifiers were studied: logistic regression with elastic net regularization (EN-LR), classification trees (CT), and random forests (RF), with corresponding hyper-parameters of regularization weights (EN-LR), minimum split and bucket level sample size (CT), number of trees and mtry (RF). Normalized on the simplex, feature importance was defined as absolute value of regression weights for EN-LR, and cumulative decrease in Gini impurity for primary and surrogate splits at each node/splits for CT and RF. Results: Grade 2 or higher toxicity included acute GI (11%), acute GU (34%), late GI (3%) and late GU (16%). GI and GU toxicity and acute and late toxicity had unique predictive biomarkers. The top three marginal genetic associations for late GU toxicity were microRNA site variants in CD6 and CD274 (PDL1)(p.val < 0.01) and BRCA2 (p.val = 0.014). Using RF, CT and EN_LR we could predict late GU toxicity with up to 70% sensitivity, 96% specificity, and 90% accuracy. Conclusions: We have identified microRNA-based biomarkers that can predict late GU toxicity. Work incorporating patient reported outcomes and to identify biomarkers for additional endpoints is ongoing.
APA, Harvard, Vancouver, ISO, and other styles
15

Rahman, Quazi Abidur, Tahir Janmohamed, Hance Clarke, Paul Ritvo, Jane Heffernan, and Joel Katz. "Interpretability and Class Imbalance in Prediction Models for Pain Volatility in Manage My Pain App Users: Analysis Using Feature Selection and Majority Voting Methods." JMIR Medical Informatics 7, no. 4 (November 20, 2019): e15601. http://dx.doi.org/10.2196/15601.

Full text
Abstract:
Background Pain volatility is an important factor in chronic pain experience and adaptation. Previously, we employed machine-learning methods to define and predict pain volatility levels from users of the Manage My Pain app. Reducing the number of features is important to help increase interpretability of such prediction models. Prediction results also need to be consolidated from multiple random subsamples to address the class imbalance issue. Objective This study aimed to: (1) increase the interpretability of previously developed pain volatility models by identifying the most important features that distinguish high from low volatility users; and (2) consolidate prediction results from models derived from multiple random subsamples while addressing the class imbalance issue. Methods A total of 132 features were extracted from the first month of app use to develop machine learning–based models for predicting pain volatility at the sixth month of app use. Three feature selection methods were applied to identify features that were significantly better predictors than other members of the large features set used for developing the prediction models: (1) Gini impurity criterion; (2) information gain criterion; and (3) Boruta. We then combined the three groups of important features determined by these algorithms to produce the final list of important features. Three machine learning methods were then employed to conduct prediction experiments using the selected important features: (1) logistic regression with ridge estimators; (2) logistic regression with least absolute shrinkage and selection operator; and (3) random forests. Multiple random under-sampling of the majority class was conducted to address class imbalance in the dataset. Subsequently, a majority voting approach was employed to consolidate prediction results from these multiple subsamples. The total number of users included in this study was 879, with a total number of 391,255 pain records. Results A threshold of 1.6 was established using clustering methods to differentiate between 2 classes: low volatility (n=694) and high volatility (n=185). The overall prediction accuracy is approximately 70% for both random forests and logistic regression models when using 132 features. Overall, 9 important features were identified using 3 feature selection methods. Of these 9 features, 2 are from the app use category and the other 7 are related to pain statistics. After consolidating models that were developed using random subsamples by majority voting, logistic regression models performed equally well using 132 or 9 features. Random forests performed better than logistic regression methods in predicting the high volatility class. The consolidated accuracy of random forests does not drop significantly (601/879; 68.4% vs 618/879; 70.3%) when only 9 important features are included in the prediction model. Conclusions We employed feature selection methods to identify important features in predicting future pain volatility. To address class imbalance, we consolidated models that were developed using multiple random subsamples by majority voting. Reducing the number of features did not result in a significant decrease in the consolidated prediction accuracy.
APA, Harvard, Vancouver, ISO, and other styles
16

Özdemir, Utkan, and Gonca Al. "Çevre Korunmasında Atığın Atıkla Giderilmesi Prensibi / Principle Of Removal With Waste Of Waste In Environmental Protection." Journal of History Culture and Art Research 1, no. 4 (January 5, 2013): 373. http://dx.doi.org/10.7596/taksad.v1i4.74.

Full text
Abstract:
Çevresel problemler göz önüne alındığında, katı atıkların bertarafı birçok dünya ülkesinin temel problemlerinden birisidir. Bu doğrultuda farklı kaynaklarda oluşan ve büyük ölçüde çeşitlilik gösteren katı atıkların bir kısmının tekrar kullanımı hedeflenmektedir. Böylece ekonomik fayda sağlanmaya çalışılmaktadır. Katı atıkların yarattığı çevre kirliliğine, su kaynaklarında meydana gelen ekolojik problemlerin de eklenmesi insanlık için daha büyük risklerin habercisidir. Dolayısıyla katı atıkların bertarafında önemli yeri olan tekrar kullanımın, sadece ekonomik faydası değil, atığın atıkla giderim esasına katkısı da tartışılmaya başlanmıştır. Tüketim hızının giderek arttığı dünyada, özellikle tarımsal kökenli atıkların, su arıtımında adsorbent olarak kullanılmasıyla yüksek arıtma verimleri sağlandığı gözlemlenmiştir. Bu durumu takip eden çeşitli araştırmalar, muz kabuğu, ayçiçeği sapı, pirinç kabuğu, portakal kabuğu gibi tarımsal kökenli atıkların yanı sıra kül ve arıtma çamuru gibi atıkların da organik ve inorganik bir takım kirleticilerin su ortamından arıtılmasında önemli rol oynadıklarını göstermiştir. Böylece adsorbent maliyeti nedeniyle çoğu zaman işletmeler tarafından uygun görülmeyen ve pilot ölçekli çalışma olarak kalan adsorpsiyon prosesinin kullanılabilirliğinin arttırılması şansı doğmuştur. Adsorpsiyon proseslerinin yaygınlaştırılması ile atıksu arıtımında elde edilebilecek yüksek verimlerin yanı sıra bu proseslerde adsorbent olarak kullanılan atıkların bertarafı da sağlanmış olacaktır. Aynı zamanda bu durum atıkların başka proseslerde de benzer şekilde değerlendirilmelerinin önünü açmaktadır. Bu çalışmada özellikle endüstriyel bazda kullanımlarında ekonomik ve çevresel faydalar sağlayacak atık kökenli adsorbentlerin türleri ve kapasiteleri karşılaştırmalı olarak değerlendirilmiştir. Principle Of Removal With Waste Of Waste In Environmental Protection Solid wastes disposal is one of the fundamental environmental problems of many world countries. By this way reusing of some parts of solid wastes which composed in different sources and shown large scale variety have been aimed. So these ways have been aimed to provide economic benefits. Ecological problems of water resources have been added to solid wastes impurity and these facts have shown bigger risks for humanity. So reusing which is more important of solid waste disposal makes a contribution to waste removal with waste except economical benefits. Especially researchers have been observed to agricultural adsorbents efficiency on the adsorption of water treatment. Some of different research shown that inorganic wastes like ash and sewage sludge as important as agricultural wastes like banana peel, sunflower stem, rice husk, orange peel on wastewater treatment. In this way adsorption’s availability is getting higher for process. High treatment efficiency on wastewater treatment and waste disposal will actualize by adsorption. And these will be example for the other process. In this study, economical and environmental benefits of waste material adsorbents’ species and capacities were evaluated for especially industrial process.
APA, Harvard, Vancouver, ISO, and other styles
17

PATIL, PRAMOD, ALKA LONDHE, and PARAG KULKARNI. "LEARNING HYPERPLANES THAT CAPTURES THE GEOMETRIC STRUCTURE OF CLASS REGIONS." Graduate Research in Engineering and Technology, July 2013, 7–12. http://dx.doi.org/10.47893/gret.2013.1003.

Full text
Abstract:
Most of the decision tree algorithms rely on impurity measures to evaluate the goodness of hyperplanes at each node while learning a decision tree in a top-down fashion. These impurity measures are not differentiable with relation to the hyperplane parameters. Therefore the algorithms for decision tree learning using impurity measures need to use some search techniques for finding the best hyperplane at every node. These impurity measures don’t properly capture the geometric structures of the data. In this paper a Two-Class algorithm for learning oblique decision trees is proposed. Aggravated by this, the algorithm uses a strategy, to evaluate the hyperplanes in such a way that the (linear) geometric structure in the data is taken into consideration. At each node of the decision tree, algorithm finds the clustering hyperplanes for both the classes. The clustering hyperplanes are obtained by solving the generalized Eigen-value problem. Then the data is splitted based on angle bisector and recursively learn the left and right sub-trees of the node. Since, in general, there will be two angle bisectors; one is selected which is better based on an impurity measure gini index. Thus the algorithm combines the ideas of linear tendencies in data and purity of nodes to find better decision trees. This idea leads to small decision trees and better performance.
APA, Harvard, Vancouver, ISO, and other styles
18

PATIL, PRAMOD, ALKA LONDHE, and PARAG KULKARNI. "LEARNING HYPERPLANES THAT CAPTURES THE GEOMETRIC STRUCTURE OF CLASS REGIONS." Graduate Research in Engineering and Technology, July 2013, 7–12. http://dx.doi.org/10.47893/gret.2013.1003.

Full text
Abstract:
Most of the decision tree algorithms rely on impurity measures to evaluate the goodness of hyperplanes at each node while learning a decision tree in a top-down fashion. These impurity measures are not differentiable with relation to the hyperplane parameters. Therefore the algorithms for decision tree learning using impurity measures need to use some search techniques for finding the best hyperplane at every node. These impurity measures don’t properly capture the geometric structures of the data. In this paper a Two-Class algorithm for learning oblique decision trees is proposed. Aggravated by this, the algorithm uses a strategy, to evaluate the hyperplanes in such a way that the (linear) geometric structure in the data is taken into consideration. At each node of the decision tree, algorithm finds the clustering hyperplanes for both the classes. The clustering hyperplanes are obtained by solving the generalized Eigen-value problem. Then the data is splitted based on angle bisector and recursively learn the left and right sub-trees of the node. Since, in general, there will be two angle bisectors; one is selected which is better based on an impurity measure gini index. Thus the algorithm combines the ideas of linear tendencies in data and purity of nodes to find better decision trees. This idea leads to small decision trees and better performance.
APA, Harvard, Vancouver, ISO, and other styles
19

Yu, Yun, Xi Wu, Jiu Chen, Gong Cheng, Xin Zhang, Cheng Wan, Jie Hu, et al. "Characterizing Brain Tumor Regions Using Texture Analysis in Magnetic Resonance Imaging." Frontiers in Neuroscience 15 (June 3, 2021). http://dx.doi.org/10.3389/fnins.2021.634926.

Full text
Abstract:
PurposeTo extract texture features from magnetic resonance imaging (MRI) scans of patients with brain tumors and use them to train a classification model for supporting an early diagnosis.MethodsTwo groups of regions (control and tumor) were selected from MRI scans of 40 patients with meningioma or glioma. These regions were analyzed to obtain texture features. Statistical analysis was conducted using SPSS (version 20.0), including the Shapiro–Wilk test and Wilcoxon signed-rank test, which were used to test significant differences in each feature between the tumor and healthy regions. T-distributed stochastic neighbor embedding (t-SNE) was used to visualize the data distribution so as to avoid tumor selection bias. The Gini impurity index in random forests (RFs) was used to select the top five out of all features. Based on the five features, three classification models were built respectively with three machine learning classifiers: RF, support vector machine (SVM), and back propagation (BP) neural network.ResultsSixteen of the 25 features were significantly different between the tumor and healthy areas. Through the Gini impurity index in RFs, standard deviation, first-order moment, variance, third-order absolute moment, and third-order central moment were selected to build the classification model. The classification model trained using the SVM classifier achieved the best performance, with sensitivity, specificity, and area under the curve of 94.04%, 92.3%, and 0.932, respectively.ConclusionTexture analysis with an SVM classifier can help differentiate between brain tumor and healthy areas with high speed and accuracy, which would facilitate its clinical application.
APA, Harvard, Vancouver, ISO, and other styles
20

Ballante, Elena, Marta Galvani, Pierpaolo Uberti, and Silvia Figini. "Polarized Classification Tree Models: Theory and Computational Aspects." Journal of Classification, February 24, 2021. http://dx.doi.org/10.1007/s00357-021-09383-8.

Full text
Abstract:
AbstractIn this paper, a new approach in classification models, called Polarized Classification Tree model, is introduced. From a methodological perspective, a new index of polarization to measure the goodness of splits in the growth of a classification tree is proposed. The new introduced measure tackles weaknesses of the classical ones used in classification trees (Gini and Information Gain), because it does not only measure the impurity but it also reflects the distribution of each covariate in the node, i.e., employing more discriminating covariates to split the data at each node. From a computational prospective, a new algorithm is proposed and implemented employing the new proposed measure in the growth of a tree. In order to show how our proposal works, a simulation exercise has been carried out. The results obtained in the simulation framework suggest that our proposal significantly outperforms impurity measures commonly adopted in classification tree modeling. Moreover, the empirical evidence on real data shows that Polarized Classification Tree models are competitive and sometimes better with respect to classical classification tree models.
APA, Harvard, Vancouver, ISO, and other styles
21

Kaissis, Georgios, Sebastian Ziegelmayer, Fabian Lohöfer, Hana Algül, Matthias Eiber, Wilko Weichert, Roland Schmid, et al. "A machine learning model for the prediction of survival and tumor subtype in pancreatic ductal adenocarcinoma from preoperative diffusion-weighted imaging." European Radiology Experimental 3, no. 1 (October 17, 2019). http://dx.doi.org/10.1186/s41747-019-0119-0.

Full text
Abstract:
Abstract Background To develop a supervised machine learning (ML) algorithm predicting above- versus below-median overall survival (OS) from diffusion-weighted imaging-derived radiomic features in patients with pancreatic ductal adenocarcinoma (PDAC). Methods One hundred two patients with histopathologically proven PDAC were retrospectively assessed as training cohort, and 30 prospectively accrued and retrospectively enrolled patients served as independent validation cohort (IVC). Tumors were segmented on preoperative apparent diffusion coefficient (ADC) maps, and radiomic features were extracted. A random forest ML algorithm was fit to the training cohort and tested in the IVC. Histopathological subtype of tumor samples was assessed by immunohistochemistry in 21 IVC patients. Individual radiomic feature importance was evaluated by assessment of tree node Gini impurity decrease and recursive feature elimination. Fisher’s exact test, 95% confidence intervals (CI), and receiver operating characteristic area under the curve (ROC-AUC) were used. Results The ML algorithm achieved 87% sensitivity (95% IC 67.3–92.7), 80% specificity (95% CI 74.0–86.7), and ROC-AUC 90% for the prediction of above- versus below-median OS in the IVC. Heterogeneity-related features were highly ranked by the model. Of the 21 patients with determined histopathological subtype, 8/9 patients predicted to experience below-median OS exhibited the quasi-mesenchymal subtype, whilst 11/12 patients predicted to experience above-median OS exhibited a non-quasi-mesenchymal subtype (p < 0.001). Conclusion ML application to ADC radiomics allowed OS prediction with a high diagnostic accuracy in an IVC. The high overlap of clinically relevant histopathological subtypes with model predictions underlines the potential of quantitative imaging in PDAC pre-operative subtyping and prognosis.
APA, Harvard, Vancouver, ISO, and other styles
22

Kosztin, A., W. R. Schwertner, M. Tokodi, Z. S. Toser, A. Kovacs, B. Veres, E. Zima, L. Geller, and B. Merkely. "P1631Machine-learning defined predictors of mortality in ischemic and non-ischemic heart failure patients undergoing CRT-P or CRT-D implantation." European Heart Journal 40, Supplement_1 (October 1, 2019). http://dx.doi.org/10.1093/eurheartj/ehz748.0390.

Full text
Abstract:
Abstract Background Both Cardiac Resynchronization Therapy Pacemakers (CRT-P) and CRT Defibrillators (CRT-D) improve mortality in heart failure patients with reduced ejection fraction and wide QRS complex. However, not every patient benefits equally from each type of treatment and determinants of mortality may vary across the subgroups of patients with different etiologies and devices. Purpose Our aim was to investigate the differences in the predictors of long-term mortality in heart failure patients with different etiologies undergoing CRT-P or CRT-D implantation using machine learning. Methods We created 4 separate random forest models to predict 5-year all-cause mortality (models for ischemic and non-ischemic etiology in both CRT-P and CRT-D subgroups). A registry of 1650 patients (66±10 years, 1258 [76%] males, 751 [46%] CRT-D) was used as the training cohort for the prediction models. Forty-seven pre-implant parameters including cardiovascular risk factors and clinical variables were utilized to train our models. For each clinical parameter, we calculated the mean decrease in Gini impurity (dG). Based on the extent of decline, the 10 most important features were selected for each model. To keep the data comparable between the different models, we took the union of these features and plotted the results on radar charts. Results There were 879 (53%) deaths during the follow-up period. The mortality benefit of adding an Implantable Cardioverter Defibrillator could be observed only in ischemic patients (Hazard Ratio = 0.83, 95% Confidence Interval: 0.72–0.97, p<0.005), but not in the entire cohort or in patients with non-ischemic etiology. In patients with non-ischemic etiology, the pattern of mortality predictors were almost similar: in CRT-P patients the most important predictors were age, serum urea levels and left ventricular ejection fraction (LVEF) (dG: 0.114, 0.054 and 0.053, respectively) whereas in the CRT-D subgroup these factors were age, LVEF and serum sodium (dG: 0.116, 0.060 and 0.052, respectively). In CRT-P patients with non-ischemic etiology, the most relevant variables were age serum urea and LVEF in decreasing order (dG: 0.085, 0.060 and 0.050, respectively). The strongest predictors of mortality were age, hemoglobin and serum creatinine in ischemic patients with CRT-D (dG: 0.088, 0.060 and 0.052, respectively). CRT-P vs. CRT-D by ischemic etiology Conclusions In patients with ischemic heart failure, CRT-D was associated with a mortality benefit compared to CRT-P. Our results also suggest that machine-learning may identify distinct patterns in clinical characteristics for a better mortality prediction. Taking these factors into consideration during the management of heart failure patients with CRT, risk stratification and outcomes could be improved.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography