Log in

Relevant bibliographies by topics / Unbalanced multi-class / Journal articles

To see the other types of publications on this topic, follow the link: Unbalanced multi-class.

Journal articles on the topic 'Unbalanced multi-class'

Author: Grafiati

Published: 4 June 2021

Last updated: 4 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 16 journal articles for your research on the topic 'Unbalanced multi-class.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Li, Dan, Wu Huang, Guobiao Xu, Tao Zhang, Zhonghui Jiang, and Xiao Wei. "Multi-class Unbalanced Data Classification for Sleep Staging." International Journal of Computer and Electrical Engineering 12, no. 2 (2020): 58–71. http://dx.doi.org/10.17706/ijcee.2020.12.2.58-71.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Zheng, Anbing, Huihua Yang, Xipeng Pan, Lihui Yin, and Yanchun Feng. "Identification of Multi-Class Drugs Based on Near Infrared Spectroscopy and Bidirectional Generative Adversarial Networks." Sensors 21, no. 4 (February 5, 2021): 1088. http://dx.doi.org/10.3390/s21041088.

Full text

Abstract:

Drug detection and identification technology are of great significance in drug supervision and management. To determine the exact source of drugs, it is often necessary to directly identify multiple varieties of drugs produced by multiple manufacturers. Near-infrared spectroscopy (NIR) combined with chemometrics is generally used in these cases. However, existing NIR classification modeling methods have great limitations in dealing with a large number of categories and spectra, especially under the premise of insufficient samples, unbalanced samples, and sensitive identification error cost. Therefore, this paper proposes a NIR multi-classification modeling method based on a modified Bidirectional Generative Adversarial Networks (Bi-GAN). It makes full utilization of the powerful feature extraction ability and good sample generation quality of Bi-GAN and uses the generated samples with obvious features, an equal number between classes, and a sufficient number within classes to replace the unbalanced and insufficient real samples in the courses of spectral classification. 1721 samples of four kinds of drugs produced by 29 manufacturers were used as experimental materials, and the results demonstrate that this method is superior to other comparative methods in drug NIR classification scenarios, and the optimal accuracy rate is even more than 99% under ideal conditions.

APA, Harvard, Vancouver, ISO, and other styles

3

Wang, Baoli, Jiye Liang, Yuhua Qian, and Chuangyin Dang. "A Normalized Numerical Scaling Method for the Unbalanced Multi-Granular Linguistic Sets." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 23, no. 02 (April 2015): 221–43. http://dx.doi.org/10.1142/s0218488515500099.

Full text

Abstract:

Decision makers often express their evaluations on decision problems with multi-granular linguistic terms. This fact leads to the unification of the multi-granular linguistic terms into a single linguistic set in the literature. However, this unification process increases the complexity of computation and the subjectivity in the determination of transformation functions. To overcome this deficiency, this paper aims to develop a normalized numerical scaling method for determining the semantics of multi-granular linguistic terms in the same domain. We first introduce a class of numerical scaling functions to generate several balanced or unbalanced linguistic sets. Since these scaled linguistic sets have different domains, we then develop a normalized numerical scaling method to form them into the unique interval [0,1]. As a result of this development, two classes of normalized scaling functions are derived from the priori scale information and applications of piecewise linear interpolation and piecewise arc interpolation. Finally, an example is given to illustrate how the method works.

APA, Harvard, Vancouver, ISO, and other styles

4

Feng, Shiyao, Yanchun Liang, Wei Du, Wei Lv, and Ying Li. "LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion." International Journal of Molecular Sciences 21, no. 19 (October 1, 2020): 7271. http://dx.doi.org/10.3390/ijms21197271.

Full text

Abstract:

Recent studies uncover that subcellular location of long non-coding RNAs (lncRNAs) can provide significant information on its function. Due to the lack of experimental data, the number of lncRNAs is very limited, experimentally verified subcellular localization, and the numbers of lncRNAs located in different organelle are wildly imbalanced. The prediction of subcellular location of lncRNAs is actually a multi-classification small sample imbalance problem. The imbalance of data results in the poor recognition effect of machine learning models on small data subsets, which is a puzzling and challenging problem in the existing research. In this study, we integrate multi-source features to construct a sequence-based computational tool, lncLocation, to predict the subcellular location of lncRNAs. Autoencoder is used to enhance part of the features, and the binomial distribution-based filtering method and recursive feature elimination (RFE) are used to filter some of the features. It improves the representation ability of data and reduces the problem of unbalanced multi-classification data. By comprehensive experiments on different feature combinations and machine learning models, we select the optimal features and classifier model scheme to construct a subcellular location prediction tool, lncLocation. LncLocation can obtain an 87.78% accuracy using 5-fold cross validation on the benchmark data, which is higher than the state-of-the-art tools, and the classification performance, especially for small class sets, is improved significantly.

APA, Harvard, Vancouver, ISO, and other styles

5

Rasti, Behnood, Pedram Ghamisi, Peter Seidel, Sandra Lorenz, and Richard Gloaguen. "Multiple Optical Sensor Fusion for Mineral Mapping of Core Samples." Sensors 20, no. 13 (July 5, 2020): 3766. http://dx.doi.org/10.3390/s20133766.

Full text

Abstract:

Geological objects are characterized by a high complexity inherent to a strong compositional variability at all scales and usually unclear class boundaries. Therefore, dedicated processing schemes are required for the analysis of such data for mineralogical mapping. On the other hand, the variety of optical sensing technology reveals different data attributes and therefore multi-sensor approaches are adapted to solve such complicated mapping problems. In this paper, we devise an adapted multi-optical sensor fusion (MOSFus) workflow which takes the geological characteristics into account. The proposed processing chain exhaustively covers all relevant stages, including data acquisition, preprocessing, feature fusion, and mineralogical mapping. The concept includes (i) a spatial feature extraction based on morphological profiles on RGB data with high spatial resolution, (ii) a specific noise reduction applied on the hyperspectral data that assumes mixed sparse and Gaussian contamination, and (iii) a subsequent dimensionality reduction using a sparse and smooth low rank analysis. The feature extraction approach allows one to fuse heterogeneous data at variable resolutions, scales, and spectral ranges and improve classification substantially. The last step of the approach, an SVM classifier, is robust to unbalanced and sparse training sets and is particularly efficient with complex imaging data. We evaluate the performance of the procedure with two different multi-optical sensor datasets. The results demonstrate the superiority of this dedicated approach over common strategies.

APA, Harvard, Vancouver, ISO, and other styles

6

KANG, Q., and C. I. VAHL. "Statistical procedures for testing hypotheses of equivalence in the safety evaluation of a genetically modified crop." Journal of Agricultural Science 154, no. 8 (January 22, 2016): 1392–412. http://dx.doi.org/10.1017/s0021859615001367.

Full text

Abstract:

SUMMARYSafety evaluation of a genetically modified crop entails assessing its equivalence to conventional crops under multi-site randomized block field designs. Despite mounting petitions for regulatory approval, there lack a scientifically sound and powerful statistical method for establishing equivalence. The current paper develops and validates two procedures for testing a recently identified class of equivalence uniquely suited to crop safety. One procedure employs the modified large sample (MLS) method; the other is based on generalized pivotal quantities (GPQs). Because both methods were originally created under balanced designs, common issues associated with incomplete and unbalanced field designs were addressed by first identifying unfulfilled theoretical assumptions and then replacing them with user-friendly approximations. Simulation indicated that the MLS procedure could be very conservative in many occasions irrespective of the balance of the design; the GPQ procedure was mildly liberal with its type I error rate near the nominal level when the design is balanced. Additional pros and cons of these two procedures are also discussed. Their utility is demonstrated in a case study using summary statistics derived from a real-world dataset.

APA, Harvard, Vancouver, ISO, and other styles

7

Chen, Binjie, Fushan Wei, and Chunxiang Gu. "Bitcoin Theft Detection Based on Supervised Machine Learning Algorithms." Security and Communication Networks 2021 (February 25, 2021): 1–10. http://dx.doi.org/10.1155/2021/6643763.

Full text

Abstract:

Since its inception, Bitcoin has been subject to numerous thefts due to its enormous economic value. Hackers steal Bitcoin wallet keys to transfer Bitcoin from compromised users, causing huge economic losses to victims. To address the security threat of Bitcoin theft, supervised learning methods were used in this study to detect and provide warnings about Bitcoin theft events. To overcome the shortcomings of the existing work, more comprehensive features of Bitcoin transaction data were extracted, the unbalanced dataset was equalized, and five supervised methods—the k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), adaptive boosting (AdaBoost), and multi-layer perceptron (MLP) techniques—as well as three unsupervised methods—the local outlier factor (LOF), one-class support vector machine (OCSVM), and Mahalanobis distance-based approach (MDB)—were used for detection. The best performer among these algorithms was the RF algorithm, which achieved recall, precision, and F1 values of 95.9%. The experimental results showed that the designed features are more effective than the currently used ones. The results of the supervised methods were significantly better than those of the unsupervised methods, and the results of the supervised methods could be further improved after equalizing the training set.

APA, Harvard, Vancouver, ISO, and other styles

8

Gattei, Valter, Dania Benedetti, Daniela Marconi, Antonella Zucchetto, Michele Dal, Pietro Bulian, Giovanni Del Poeta, et al. "Gene Expression Profiling (GEP) of CD38-Expressing/Unmutated B-Cell Chronic Lymphocytic Leukemia (B-CLL) Cells by Using a Statistical Approach Suitable for Analysis of Unbalanced Datasets." Blood 108, no. 11 (November 1, 2006): 2089. http://dx.doi.org/10.1182/blood.v108.11.2089.2089.

Full text

Abstract:

Abstract B-CLL is a apparently homogeneous disease with variable clinical courses, which can be foreseen by the presence of mutated (M) or unmutated (UM) IgVH genes and the expression of prognostic markers, including CD38. Since a correlation between high CD38 and UM IgVH gene configuration has been described, we performed GEP to identify the gene signature of CD38+/UM B-CLLs. Purified (>95%) B-CLL cells from 44 cases were utilized for a dual-labeling GEP strategy (Operon Human Genome 2.1 OligoSet; 21,329 70mers) with pooled normal PB B-cells as common reference. 12 B-CLLs were UM (<2% IgVH mutations) and CD38pos (CD38>30% of B-CLL cells), while 32 were M (>2% IgVH mutations) and CD38neg (CD38<10% of B-CLL cells). To discover genes differentially expressed in the two categories and overcome the problem of unbalanced dataset, we applied an original bioinformatic approach called multi-SAM (Significance Analysis of Microarrays). This consists in reiterated applications of SAM analysis comparing the less populated CD38pos/UM class with 1,000 random samplings, each of 12 cases, from the CD38neg/M class. For each single application of SAM, a list of differentially expressed genes (p<10-3) was generated. At the end of 1,000 reiterations, each single gene was labeled with a 0-1,000 list score (LS) based on the times it was selected by multi-SAM as differentially expressed. A significant LS threshold>300 was determined by applying multi-SAM to 1,000 random comparisons of two mock-classes, each of 12 cases, from the same dataset. The final gene list was further shrunk by keeping only the genes with a median-log-difference (MLD) between the two categories exceeding the absolute value of 1; eventually, a list of 132 genes (44 down-regulated and 88 up-regulated in CD38pos/UM cases) was obtained. According to these analyses, CD38pos/UM B-CLLs overexpressed the following gene groups: i) genes related to lipid metabolism: mainly Lipoprotein Lipase (LS=744, MLD=2.05), but also low-density-lipoprotein receptor (LDLR) and LDLR-related-protein-5, these latter with a LS>300 but lower (0.7) MLDs. ii) genes related to cell-cell/cell-matrix interactions: CD49d/alpha4 integrin (LS=354, MLD=1.14), a molecule whose expression has already been correlated with CD38 in previous extensive surface antigen expression studies of ours; the C-C chemokines MIP-1alpha (a.k.a. CCL3; LS=660, MLD=1.46) and MIP-1beta (a.k.a. CCL4; LS=334, MLD=1.36); CD72 (low-affinity CD100 ligand; LS=523, MLD=1.06). iii) genes related to vescicle trafficking/cytoskeletron reorganization: septin-7 (LS 386, MLD=1.19) and septin-10 (LS=926, MLD=3.12); the spastic paraplegia-20 protein (a.k.a. spartin, LS=886, MLD=1.84); iii) Activation-Induced Cytidine Deaminase (AICD; LS=599, MLD=2.07), a gene preliminarly found as overexpressed in UM B-CLLs. Altogether, these genes, besides having clinical value as additional prognosticators, may be implied in several aspects of the functional cross-talk between CD38pos/UM B-CLL and neighbouring cells within the lymph node microenvironment, this interplay eventually affecting survival of tumor cells.

APA, Harvard, Vancouver, ISO, and other styles

9

Bernadó-Mansilla, Ester, and Josep M. Garrell-Guiu. "Accuracy-Based Learning Classifier Systems: Models, Analysis and Applications to Classification Tasks." Evolutionary Computation 11, no. 3 (September 2003): 209–38. http://dx.doi.org/10.1162/106365603322365289.

Full text

Abstract:

Recently, Learning Classifier Systems (LCS) and particularly XCS have arisen as promising methods for classification tasks and data mining. This paper investigates two models of accuracy-based learning classifier systems on different types of classification problems. Departing from XCS, we analyze the evolution of a complete action map as a knowledge representation. We propose an alternative, UCS, which evolves a best action map more efficiently. We also investigate how the fitness pressure guides the search towards accurate classifiers. While XCS bases fitness on a reinforcement learning scheme, UCS defines fitness from a supervised learning scheme. We find significant differences in how the fitness pressure leads towards accuracy, and suggest the use of a supervised approach specially for multi-class problems and problems with unbalanced classes. We also investigate the complexity factors which arise in each type of accuracy-based LCS. We provide a model on the learning complexity of LCS which is based on the representative examples given to the system. The results and observations are also extended to a set of real world classification problems, where accuracy-based LCS are shown to perform competitively with respect to other learning algorithms. The work presents an extended analysis of accuracy-based LCS, gives insight into the understanding of the LCS dynamics, and suggests open issues for further improvement of LCS on classification tasks.

APA, Harvard, Vancouver, ISO, and other styles

10

Wang, Fangru, and Catherine L. Ross. "Machine Learning Travel Mode Choices: Comparing the Performance of an Extreme Gradient Boosting Model with a Multinomial Logit Model." Transportation Research Record: Journal of the Transportation Research Board 2672, no. 47 (May 14, 2018): 35–45. http://dx.doi.org/10.1177/0361198118773556.

Full text

Abstract:

The multinomial logit (MNL) model and its variations have been dominating the travel mode choice modeling field for decades. Advantages of the MNL model include its elegant closed-form mathematical structure and its interpretable model estimation results based on random utility theory, while its main limitation is the strict statistical assumptions. Recent computational advancement has allowed easier application of machine learning models to travel behavior analysis, though research in this field is not thorough or conclusive. In this paper, we explore the application of the extreme gradient boosting (XGB) model to travel mode choice modeling and compare the result with an MNL model, using the Delaware Valley 2012 regional household travel survey data. The XGB model is an ensemble method based on the decision-tree algorithm and it has recently received a great deal of attention and use because of its high machine learning performance. The modeling and predicting results of the XGB model and the MNL model are compared by examining their multi-class predictive errors. We found that the XGB model has overall higher prediction accuracy than the MNL model especially when the dataset is not extremely unbalanced. The MNL model has great explanatory power and it also displays strong consistency between training and testing errors. Multiple trip characteristics, socio-demographic traits, and built-environment variables are found to be significantly associated with people’s mode choices in the region, but mode-specific travel time is found to be the most determinant factor for mode choice.

APA, Harvard, Vancouver, ISO, and other styles

11

Ramadan, Imad Zeyad. "Panel Data Approach of the Firm’s Value Determinants: Evidence from the Jordanian Industrial Firms." Modern Applied Science 10, no. 5 (April 2, 2016): 163. http://dx.doi.org/10.5539/mas.v10n5p163.

Full text

Abstract:

<p class="zhengwen">This study aimed to investigate the main determinants of the industrial firms' value in developing countries namely Jordan. To achieve this goal all 77 ASE listed industrial firms for the period from 2000 to 2014 were utilized resulting in 974 firm-year observations. Twelve firm specific variables, namely, firm's size; firm's age; firm's risk level; firm's sales revenue; firm's operating cost; firm's tax rate; firm's net margin; firm's capital expenditure; firm's book value; firm's earning per share; firm's dividend per share and firm's pay-out ratio, were tested as a possible determinates of the firm's value. After testing for Multicollinearity and Heteroscedasticity the result of the unbalanced panel data Multi-regression model approach shows that the joint effect of the twelve potential determinants interprets about 37% of the variation in the value of the Jordanian industrial firms listed at ASE (R-squares = 0.3682), therefore, firm's in developing countries like Jordan should concentrate on these specific variables of the firms in order to improve the value and thus the wealth of the shareholders<strong>. </strong></p>Another finding of the study is that the firm's risk level and tax rate are not statistically significant drivers of the Jordanian industrial firm's value. The findings of the effect of firm's risk level and tax rate on the firm's value were contrary with Tiwari Ranjit et al (2015) and Rappaport (1998) respectively.

APA, Harvard, Vancouver, ISO, and other styles

12

Mai, Antonello, Silvio Massa, Antonella Di Noia, Katija Jelicic, Elena Alfani, Cristina Di Rico, Angela Di Baldassarre, Anna Rita Migliaccio, and Giovanni Migliaccio. "Aroyl-Pyrrolyl-Hydroxy-Amides (APHAs), a Novel Family of Synthetic Histone Deacetylases Inhibitors, Are Potent Inducers of Human g-Globin Gene Expression." Blood 104, no. 11 (November 16, 2004): 1216. http://dx.doi.org/10.1182/blood.v104.11.1216.1216.

Full text

Abstract:

Abstract Post-natal pharmacological reactivation of HbF, by restoring the unbalanced α/non-α globin chain production in red cells of patients affected by β-thalassemia or sickle cell anemia, represents a potential cure for these diseases. Many classes of compounds have been identified capable to induce Hb F synthesis in vitro by acting at different levels of the globin gene expression regulatory machinery. One of these classes is represented by inhibitors of a family of enzymes, the histone deacetylases (HDACs), involved in chromatin remodelling and gene transcription regulation. HDACs act in multi-protein complexes that remove acetyl groups from lysine residues on several proteins, including histones and are divided into three distinct structural classes, depending on whether their catalytic activity is zinc (class I/II)- or NAD+ (class III)-dependent. The effects of the HDACs inhibitors identified so far on HbF synthesis is, however, modest and often associated with high toxicity. Therefore, the potential of their clinical use is unclear. We have recently described a new family of synthetic HDACs inhibitors, the Aroyl-pyrrolyl-hydroxy-amides (APHAs), that induce differentiation, growth arrest and/or apoptosis of transformed cell in culture [Mai A et al, J Med Chem2004;47:1098]. In this study, we investigate the capability of 10 different APHA compounds to induce Hb F in two in vitro assays. One assay is based on the ability of APHA compounds to activate either the human Aγ-driven Firefly (Aγ-F) or the β-promoter drives Renilla Luciferase (β-R) reporter in GM979 cells stably transfected with a Dual Luciferase Reporter construct. The second assay is represented by the induction of γ-globin expression (by quantitative RT-PCR) in primary adult erythroblasts obtained in HEMA cultures of mononuclear cells from normal donors. The majority of the compounds tested did not significantly increased the Aγ−F (Aγ−F+β−R) reporter ratio in GM979 cells. However, the compound MC1575 increased by 3-fold (from 0.09 to 0.30) the reporter ratio in GM979 cells at a concentration of 20 μM, with modest effects of the proliferation activity of GM979 cells over the three days of the assay. When MC1575 was added at a concentration of 2–10 μM in cultures of primary adult erythroblasts induced to differentiate in serum-free media for 4 days, it induced a three fold increase of the γ/(γ+β) globin ratio (from 0.04 to 0.12), with no apparent cellular toxicity. Among the HDAC inhibitors tested in this study, MC1575 was not the most potent inhibitor of total enzyme activity. However, it was the compound that most selectively inhibited the activity of the maize homologue of mammalian class IIa HDAC enzymes [Mai et al, J Med Chem2003;46:4826]. These results are consistent with the hypothesis that each class of histone deacetylases might have a specific biological function and indicate that those of class IIa might represent the enzymes most specifically involved in globin gene regulation. We suggest that, by targeting the chemical inhibitors toward the catalytic domain of this class of enzymes, it should be possible to identify more specific, more potent and less toxic compounds for pharmacological treatment of β-thalassemia or sickle cell anemia.

APA, Harvard, Vancouver, ISO, and other styles

13

Sokolov, A. V., and O. N. Zhdanov. "THE CLASS OF PERFECT TERNARY ARRAYS." «System analysis and applied information science», no. 2 (August 7, 2018): 47–54. http://dx.doi.org/10.21122/2309-4923-2018-2-47-54.

Full text

Abstract:

In recent decades, perfect algebraic constructions are successfully being use to signal systems synthesis, to construct block and stream cryptographic algorithms, to create pseudo-random sequence generators as well as in many other fields of science and technology. Among perfect algebraic constructions a significant place is occupied by bent-sequences and the class of perfect binary arrays associated with them. Bent-sequences are used for development of modern cryptographic primitives, as well as for constructing constant amplitude codes (C-codes) used in code division multiple access technology. In turn, perfect binary arrays are used for constructing correction codes, systems of biphase phase- shifted signals and multi-level cryptographic systems. The development of methods of many-valued logic in modern information and communication systems has attracted the attention of researchers to the improvement of methods for synthesizing many-valued bent-sequences for cryptography and information transmission tasks. The new results obtained in the field of the synthesis of ternary bent-sequences, make actual the problem of researching the class of perfect ternary arrays. In this paper we consider the problem of extending the definition of perfect binary arrays to three-valued logic case, as a result of which the definition of a perfect ternary array was introduced on the basis of the determination of the unbalance of the ternary function. A complete class of perfect ternary arrays of the third order is obtained by a regular method, bypassing the search. Thus, it is established that the class of perfect ternary arrays is a union of four subclasses, in each of which the corresponding methods of reproduction are determined. The paper establishes the relationship between the class of ternary bent-sequences and the class of perfect ternary arrays. The obtained results are the basis for the introduction of perfect ternary arrays into modern cryptographic and telecommunication algorithms.

APA, Harvard, Vancouver, ISO, and other styles

14

"Multi-Label Classification with PSO based Synthetic Minority Over-Sampling Technique (Psosmote) for Imbalanced Samples." International Journal of Recent Technology and Engineering 8, no. 4 (November 30, 2019): 4039–42. http://dx.doi.org/10.35940/ijrte.d8437.118419.

Full text

Abstract:

Recently, the learning from unbalanced data has emerged to be a pre-dominant problem in several applications and in that multi label classification is an evolving data mining task, learning from unbalanced multilabel data is being examined. However, the available algorithms-based SMOTE makes use of the same sampling rate for every instance of the minority class. This leads to sub-optimal performance. To deal with this problem, a new Particle Swarm Optimization based SMOTE (PSOSMOTE) algorithm is proposed. The PSOSMOTE algorithm employs diverse sampling rates for multiple minority class instances and gets the fusion of optimal sampling rates and to deal with classification of unbalanced datasets. Then, Bayesian technique is combined with Random forest for multilabel classification (BARF-MLC) is to address the inherent label dependencies among samples such as ML-FOREST classifier, Predictive Clustering Trees (PCT), Hierarchy of Multi Label Classifier (HOMER) by taking the different metrics including precision, recall, F-measure, Accuracy and Error Rate.

APA, Harvard, Vancouver, ISO, and other styles

15

Parlak, Bekir, and Alper Kursat Uysal. "The effects of globalisation techniques on feature selection for text classification." Journal of Information Science, June 18, 2020, 016555152093089. http://dx.doi.org/10.1177/0165551520930897.

Full text

Abstract:

Text classification (TC) is very important and critical task in the 21th century as there exist high volume of electronic data on the Internet. In TC, textual data are characterised by a huge number of highly sparse features/terms. A typical TC consists of many steps and one of the most important steps is undoubtedly feature selection (FS). In this study, we have comprehensively investigated the effects of various globalisation techniques on local feature selection (LFS) methods using datasets with different characteristics such as multi-class unbalanced (MCU), multi-class balanced (MCB), binary-class unbalanced (BCU) and binary-class balanced (BCB). The globalisation techniques used in this study are summation (SUM), weighted-sum (AVG), and maximum (MAX). To investigate the effect of globalisation techniques, we used three LFS methods named as Discriminative Feature Selection (DFSS), odds ratio (OR) and chi-square (CHI2). In the experiments, we have utilised four different benchmark datasets named as Reuters-21578, 20Newsgroup., Enron1, and Polarity in addition to Support Vector Machines (SVM) and Decision Tree (DT) classifiers. According to the experimental results, the most successful globalisation technique is AVG while all situations are taken into account. The experimental results indicate that DFSS method is more successful than OR and CHI2 methods on datasets with MCU and MCB characteristics. However, CHI2 method seems more accurate than OR and DFSS methods on datasets with BCU and BCB characteristics. Also, SVM classifier performed better than DT classifier in most cases.

APA, Harvard, Vancouver, ISO, and other styles

16

Pustokhina, Irina V., Denis A. Pustokhin, Phong Thanh Nguyen, Mohamed Elhoseny, and K. Shankar. "Multi-objective rain optimization algorithm with WELM model for customer churn prediction in telecommunication sector." Complex & Intelligent Systems, April 5, 2021. http://dx.doi.org/10.1007/s40747-021-00353-6.

Full text

Abstract:

AbstractCustomer retention is a major challenge in several business sectors and diverse companies identify the customer churn prediction (CCP) as an important process for retaining the customers. CCP in the telecommunication sector has become an essential need owing to a rise in the number of the telecommunication service providers. Recently, machine learning (ML) and deep learning (DL) models have begun to develop effective CCP model. This paper presents a new improved synthetic minority over-sampling technique (SMOTE) with optimal weighted extreme machine learning (OWELM) called the ISMOTE-OWELM model for CCP. The presented model comprises preprocessing, balancing the unbalanced dataset, and classification. The multi-objective rain optimization algorithm (MOROA) is used for two purposes: determining the optimal sampling rate of SMOTE and parameter tuning of WELM. Initially, the customer data involve data normalization and class labeling. Then, the ISMOTE is employed to handle the imbalanced dataset where the rain optimization algorithm (ROA) is applied to determine the optimal sampling rate. At last, the WELM model is applied to determine the class labels of the applied data. Extensive experimentation is carried out to ensure the ISMOTE-OWELM model against the CCP Telecommunication dataset. The simulation outcome portrayed that the ISMOTE-OWELM model is superior to other models with the accuracy of 0.94, 0.92, 0.909 on the applied dataset I, II, and III, respectively.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!