Academic literature on the topic 'Unbalanced multi-class'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Unbalanced multi-class.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Unbalanced multi-class"

1

Li, Dan, Wu Huang, Guobiao Xu, Tao Zhang, Zhonghui Jiang, and Xiao Wei. "Multi-class Unbalanced Data Classification for Sleep Staging." International Journal of Computer and Electrical Engineering 12, no. 2 (2020): 58–71. http://dx.doi.org/10.17706/ijcee.2020.12.2.58-71.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zheng, Anbing, Huihua Yang, Xipeng Pan, Lihui Yin, and Yanchun Feng. "Identification of Multi-Class Drugs Based on Near Infrared Spectroscopy and Bidirectional Generative Adversarial Networks." Sensors 21, no. 4 (February 5, 2021): 1088. http://dx.doi.org/10.3390/s21041088.

Full text
Abstract:
Drug detection and identification technology are of great significance in drug supervision and management. To determine the exact source of drugs, it is often necessary to directly identify multiple varieties of drugs produced by multiple manufacturers. Near-infrared spectroscopy (NIR) combined with chemometrics is generally used in these cases. However, existing NIR classification modeling methods have great limitations in dealing with a large number of categories and spectra, especially under the premise of insufficient samples, unbalanced samples, and sensitive identification error cost. Therefore, this paper proposes a NIR multi-classification modeling method based on a modified Bidirectional Generative Adversarial Networks (Bi-GAN). It makes full utilization of the powerful feature extraction ability and good sample generation quality of Bi-GAN and uses the generated samples with obvious features, an equal number between classes, and a sufficient number within classes to replace the unbalanced and insufficient real samples in the courses of spectral classification. 1721 samples of four kinds of drugs produced by 29 manufacturers were used as experimental materials, and the results demonstrate that this method is superior to other comparative methods in drug NIR classification scenarios, and the optimal accuracy rate is even more than 99% under ideal conditions.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Baoli, Jiye Liang, Yuhua Qian, and Chuangyin Dang. "A Normalized Numerical Scaling Method for the Unbalanced Multi-Granular Linguistic Sets." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 23, no. 02 (April 2015): 221–43. http://dx.doi.org/10.1142/s0218488515500099.

Full text
Abstract:
Decision makers often express their evaluations on decision problems with multi-granular linguistic terms. This fact leads to the unification of the multi-granular linguistic terms into a single linguistic set in the literature. However, this unification process increases the complexity of computation and the subjectivity in the determination of transformation functions. To overcome this deficiency, this paper aims to develop a normalized numerical scaling method for determining the semantics of multi-granular linguistic terms in the same domain. We first introduce a class of numerical scaling functions to generate several balanced or unbalanced linguistic sets. Since these scaled linguistic sets have different domains, we then develop a normalized numerical scaling method to form them into the unique interval [0,1]. As a result of this development, two classes of normalized scaling functions are derived from the priori scale information and applications of piecewise linear interpolation and piecewise arc interpolation. Finally, an example is given to illustrate how the method works.
APA, Harvard, Vancouver, ISO, and other styles
4

Feng, Shiyao, Yanchun Liang, Wei Du, Wei Lv, and Ying Li. "LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion." International Journal of Molecular Sciences 21, no. 19 (October 1, 2020): 7271. http://dx.doi.org/10.3390/ijms21197271.

Full text
Abstract:
Recent studies uncover that subcellular location of long non-coding RNAs (lncRNAs) can provide significant information on its function. Due to the lack of experimental data, the number of lncRNAs is very limited, experimentally verified subcellular localization, and the numbers of lncRNAs located in different organelle are wildly imbalanced. The prediction of subcellular location of lncRNAs is actually a multi-classification small sample imbalance problem. The imbalance of data results in the poor recognition effect of machine learning models on small data subsets, which is a puzzling and challenging problem in the existing research. In this study, we integrate multi-source features to construct a sequence-based computational tool, lncLocation, to predict the subcellular location of lncRNAs. Autoencoder is used to enhance part of the features, and the binomial distribution-based filtering method and recursive feature elimination (RFE) are used to filter some of the features. It improves the representation ability of data and reduces the problem of unbalanced multi-classification data. By comprehensive experiments on different feature combinations and machine learning models, we select the optimal features and classifier model scheme to construct a subcellular location prediction tool, lncLocation. LncLocation can obtain an 87.78% accuracy using 5-fold cross validation on the benchmark data, which is higher than the state-of-the-art tools, and the classification performance, especially for small class sets, is improved significantly.
APA, Harvard, Vancouver, ISO, and other styles
5

Rasti, Behnood, Pedram Ghamisi, Peter Seidel, Sandra Lorenz, and Richard Gloaguen. "Multiple Optical Sensor Fusion for Mineral Mapping of Core Samples." Sensors 20, no. 13 (July 5, 2020): 3766. http://dx.doi.org/10.3390/s20133766.

Full text
Abstract:
Geological objects are characterized by a high complexity inherent to a strong compositional variability at all scales and usually unclear class boundaries. Therefore, dedicated processing schemes are required for the analysis of such data for mineralogical mapping. On the other hand, the variety of optical sensing technology reveals different data attributes and therefore multi-sensor approaches are adapted to solve such complicated mapping problems. In this paper, we devise an adapted multi-optical sensor fusion (MOSFus) workflow which takes the geological characteristics into account. The proposed processing chain exhaustively covers all relevant stages, including data acquisition, preprocessing, feature fusion, and mineralogical mapping. The concept includes (i) a spatial feature extraction based on morphological profiles on RGB data with high spatial resolution, (ii) a specific noise reduction applied on the hyperspectral data that assumes mixed sparse and Gaussian contamination, and (iii) a subsequent dimensionality reduction using a sparse and smooth low rank analysis. The feature extraction approach allows one to fuse heterogeneous data at variable resolutions, scales, and spectral ranges and improve classification substantially. The last step of the approach, an SVM classifier, is robust to unbalanced and sparse training sets and is particularly efficient with complex imaging data. We evaluate the performance of the procedure with two different multi-optical sensor datasets. The results demonstrate the superiority of this dedicated approach over common strategies.
APA, Harvard, Vancouver, ISO, and other styles
6

KANG, Q., and C. I. VAHL. "Statistical procedures for testing hypotheses of equivalence in the safety evaluation of a genetically modified crop." Journal of Agricultural Science 154, no. 8 (January 22, 2016): 1392–412. http://dx.doi.org/10.1017/s0021859615001367.

Full text
Abstract:
SUMMARYSafety evaluation of a genetically modified crop entails assessing its equivalence to conventional crops under multi-site randomized block field designs. Despite mounting petitions for regulatory approval, there lack a scientifically sound and powerful statistical method for establishing equivalence. The current paper develops and validates two procedures for testing a recently identified class of equivalence uniquely suited to crop safety. One procedure employs the modified large sample (MLS) method; the other is based on generalized pivotal quantities (GPQs). Because both methods were originally created under balanced designs, common issues associated with incomplete and unbalanced field designs were addressed by first identifying unfulfilled theoretical assumptions and then replacing them with user-friendly approximations. Simulation indicated that the MLS procedure could be very conservative in many occasions irrespective of the balance of the design; the GPQ procedure was mildly liberal with its type I error rate near the nominal level when the design is balanced. Additional pros and cons of these two procedures are also discussed. Their utility is demonstrated in a case study using summary statistics derived from a real-world dataset.
APA, Harvard, Vancouver, ISO, and other styles
7

Chen, Binjie, Fushan Wei, and Chunxiang Gu. "Bitcoin Theft Detection Based on Supervised Machine Learning Algorithms." Security and Communication Networks 2021 (February 25, 2021): 1–10. http://dx.doi.org/10.1155/2021/6643763.

Full text
Abstract:
Since its inception, Bitcoin has been subject to numerous thefts due to its enormous economic value. Hackers steal Bitcoin wallet keys to transfer Bitcoin from compromised users, causing huge economic losses to victims. To address the security threat of Bitcoin theft, supervised learning methods were used in this study to detect and provide warnings about Bitcoin theft events. To overcome the shortcomings of the existing work, more comprehensive features of Bitcoin transaction data were extracted, the unbalanced dataset was equalized, and five supervised methods—the k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), adaptive boosting (AdaBoost), and multi-layer perceptron (MLP) techniques—as well as three unsupervised methods—the local outlier factor (LOF), one-class support vector machine (OCSVM), and Mahalanobis distance-based approach (MDB)—were used for detection. The best performer among these algorithms was the RF algorithm, which achieved recall, precision, and F1 values of 95.9%. The experimental results showed that the designed features are more effective than the currently used ones. The results of the supervised methods were significantly better than those of the unsupervised methods, and the results of the supervised methods could be further improved after equalizing the training set.
APA, Harvard, Vancouver, ISO, and other styles
8

Gattei, Valter, Dania Benedetti, Daniela Marconi, Antonella Zucchetto, Michele Dal, Pietro Bulian, Giovanni Del Poeta, et al. "Gene Expression Profiling (GEP) of CD38-Expressing/Unmutated B-Cell Chronic Lymphocytic Leukemia (B-CLL) Cells by Using a Statistical Approach Suitable for Analysis of Unbalanced Datasets." Blood 108, no. 11 (November 1, 2006): 2089. http://dx.doi.org/10.1182/blood.v108.11.2089.2089.

Full text
Abstract:
Abstract B-CLL is a apparently homogeneous disease with variable clinical courses, which can be foreseen by the presence of mutated (M) or unmutated (UM) IgVH genes and the expression of prognostic markers, including CD38. Since a correlation between high CD38 and UM IgVH gene configuration has been described, we performed GEP to identify the gene signature of CD38+/UM B-CLLs. Purified (>95%) B-CLL cells from 44 cases were utilized for a dual-labeling GEP strategy (Operon Human Genome 2.1 OligoSet; 21,329 70mers) with pooled normal PB B-cells as common reference. 12 B-CLLs were UM (<2% IgVH mutations) and CD38pos (CD38>30% of B-CLL cells), while 32 were M (>2% IgVH mutations) and CD38neg (CD38<10% of B-CLL cells). To discover genes differentially expressed in the two categories and overcome the problem of unbalanced dataset, we applied an original bioinformatic approach called multi-SAM (Significance Analysis of Microarrays). This consists in reiterated applications of SAM analysis comparing the less populated CD38pos/UM class with 1,000 random samplings, each of 12 cases, from the CD38neg/M class. For each single application of SAM, a list of differentially expressed genes (p<10-3) was generated. At the end of 1,000 reiterations, each single gene was labeled with a 0-1,000 list score (LS) based on the times it was selected by multi-SAM as differentially expressed. A significant LS threshold>300 was determined by applying multi-SAM to 1,000 random comparisons of two mock-classes, each of 12 cases, from the same dataset. The final gene list was further shrunk by keeping only the genes with a median-log-difference (MLD) between the two categories exceeding the absolute value of 1; eventually, a list of 132 genes (44 down-regulated and 88 up-regulated in CD38pos/UM cases) was obtained. According to these analyses, CD38pos/UM B-CLLs overexpressed the following gene groups: i) genes related to lipid metabolism: mainly Lipoprotein Lipase (LS=744, MLD=2.05), but also low-density-lipoprotein receptor (LDLR) and LDLR-related-protein-5, these latter with a LS>300 but lower (0.7) MLDs. ii) genes related to cell-cell/cell-matrix interactions: CD49d/alpha4 integrin (LS=354, MLD=1.14), a molecule whose expression has already been correlated with CD38 in previous extensive surface antigen expression studies of ours; the C-C chemokines MIP-1alpha (a.k.a. CCL3; LS=660, MLD=1.46) and MIP-1beta (a.k.a. CCL4; LS=334, MLD=1.36); CD72 (low-affinity CD100 ligand; LS=523, MLD=1.06). iii) genes related to vescicle trafficking/cytoskeletron reorganization: septin-7 (LS 386, MLD=1.19) and septin-10 (LS=926, MLD=3.12); the spastic paraplegia-20 protein (a.k.a. spartin, LS=886, MLD=1.84); iii) Activation-Induced Cytidine Deaminase (AICD; LS=599, MLD=2.07), a gene preliminarly found as overexpressed in UM B-CLLs. Altogether, these genes, besides having clinical value as additional prognosticators, may be implied in several aspects of the functional cross-talk between CD38pos/UM B-CLL and neighbouring cells within the lymph node microenvironment, this interplay eventually affecting survival of tumor cells.
APA, Harvard, Vancouver, ISO, and other styles
9

Bernadó-Mansilla, Ester, and Josep M. Garrell-Guiu. "Accuracy-Based Learning Classifier Systems: Models, Analysis and Applications to Classification Tasks." Evolutionary Computation 11, no. 3 (September 2003): 209–38. http://dx.doi.org/10.1162/106365603322365289.

Full text
Abstract:
Recently, Learning Classifier Systems (LCS) and particularly XCS have arisen as promising methods for classification tasks and data mining. This paper investigates two models of accuracy-based learning classifier systems on different types of classification problems. Departing from XCS, we analyze the evolution of a complete action map as a knowledge representation. We propose an alternative, UCS, which evolves a best action map more efficiently. We also investigate how the fitness pressure guides the search towards accurate classifiers. While XCS bases fitness on a reinforcement learning scheme, UCS defines fitness from a supervised learning scheme. We find significant differences in how the fitness pressure leads towards accuracy, and suggest the use of a supervised approach specially for multi-class problems and problems with unbalanced classes. We also investigate the complexity factors which arise in each type of accuracy-based LCS. We provide a model on the learning complexity of LCS which is based on the representative examples given to the system. The results and observations are also extended to a set of real world classification problems, where accuracy-based LCS are shown to perform competitively with respect to other learning algorithms. The work presents an extended analysis of accuracy-based LCS, gives insight into the understanding of the LCS dynamics, and suggests open issues for further improvement of LCS on classification tasks.
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Fangru, and Catherine L. Ross. "Machine Learning Travel Mode Choices: Comparing the Performance of an Extreme Gradient Boosting Model with a Multinomial Logit Model." Transportation Research Record: Journal of the Transportation Research Board 2672, no. 47 (May 14, 2018): 35–45. http://dx.doi.org/10.1177/0361198118773556.

Full text
Abstract:
The multinomial logit (MNL) model and its variations have been dominating the travel mode choice modeling field for decades. Advantages of the MNL model include its elegant closed-form mathematical structure and its interpretable model estimation results based on random utility theory, while its main limitation is the strict statistical assumptions. Recent computational advancement has allowed easier application of machine learning models to travel behavior analysis, though research in this field is not thorough or conclusive. In this paper, we explore the application of the extreme gradient boosting (XGB) model to travel mode choice modeling and compare the result with an MNL model, using the Delaware Valley 2012 regional household travel survey data. The XGB model is an ensemble method based on the decision-tree algorithm and it has recently received a great deal of attention and use because of its high machine learning performance. The modeling and predicting results of the XGB model and the MNL model are compared by examining their multi-class predictive errors. We found that the XGB model has overall higher prediction accuracy than the MNL model especially when the dataset is not extremely unbalanced. The MNL model has great explanatory power and it also displays strong consistency between training and testing errors. Multiple trip characteristics, socio-demographic traits, and built-environment variables are found to be significantly associated with people’s mode choices in the region, but mode-specific travel time is found to be the most determinant factor for mode choice.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Unbalanced multi-class"

1

Shan, Liang. "Joint Gaussian Graphical Model for multi-class and multi-level data." Diss., Virginia Tech, 2016. http://hdl.handle.net/10919/81412.

Full text
Abstract:
Gaussian graphical model has been a popular tool to investigate conditional dependency between random variables by estimating sparse precision matrices. The estimated precision matrices could be mapped into networks for visualization. For related but different classes, jointly estimating networks by taking advantage of common structure across classes can help us better estimate conditional dependencies among variables. Furthermore, there may exist multilevel structure among variables; some variables are considered as higher level variables and others are nested in these higher level variables, which are called lower level variables. In this dissertation, we made several contributions to the area of joint estimation of Gaussian graphical models across heterogeneous classes: the first is to propose a joint estimation method for estimating Gaussian graphical models across unbalanced multi-classes, whereas the second considers multilevel variable information during the joint estimation procedure and simultaneously estimates higher level network and lower level network. For the first project, we consider the problem of jointly estimating Gaussian graphical models across unbalanced multi-class. Most existing methods require equal or similar sample size among classes. However, many real applications do not have similar sample sizes. Hence, in this dissertation, we propose the joint adaptive graphical lasso, a weighted L1 penalized approach, for unbalanced multi-class problems. Our joint adaptive graphical lasso approach combines information across classes so that their common characteristics can be shared during the estimation process. We also introduce regularization into the adaptive term so that the unbalancedness of data is taken into account. Simulation studies show that our approach performs better than existing methods in terms of false positive rate, accuracy, Mathews correlation coefficient, and false discovery rate. We demonstrate the advantage of our approach using liver cancer data set. For the second one, we propose a method to jointly estimate the multilevel Gaussian graphical models across multiple classes. Currently, methods are still limited to investigate a single level conditional dependency structure when there exists the multilevel structure among variables. Due to the fact that higher level variables may work together to accomplish certain tasks, simultaneously exploring conditional dependency structures among higher level variables and among lower level variables are of our main interest. Given multilevel data from heterogeneous classes, our method assures that common structures in terms of the multilevel conditional dependency are shared during the estimation procedure, yet unique structures for each class are retained as well. Our proposed approach is achieved by first introducing a higher level variable factor within a class, and then common factors across classes. The performance of our approach is evaluated on several simulated networks. We also demonstrate the advantage of our approach using breast cancer patient data.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
2

Sheikh-Nia, Samaneh. "An Investigation of Standard and Ensemble Based Classification Techniques for the Prediction of Hospitalization Duration." Thesis, 2012. http://hdl.handle.net/10214/3902.

Full text
Abstract:
In any health-care system, early identification of individuals who are most at risk of developing an illness is vital, not only to ensure that a patient is provided with the appropriate treatment, but also to avoid the considerable costs associated with unnecessary hospitalization. To achieve this goal there is a need for a breakthrough prediction method that is capable of dealing with a real world medical data which is inherently complex. In this study, we show how standard classification algorithms can be employed collectively to predict the length of stay in a hospital of a patient in the upcoming year, based on their medical history. Multiple classifiers are used to perform the prediction task, since real world medical data is significantly complex making the classification task very challenging. The data is voluminous, consists of wide range of class values some of which with a few instances, and it is highly unbalanced making the classification of minority classes very difficult. We propose two Sequential Ensemble Classification (SEC) schemes, one based on an ensemble of homogeneous classifiers, and a second based on a heterogeneous ensemble of classifiers, in three hierarchical granularity levels. The goal of using this system is to provide increased performance over the standard classifiers. This method is highly beneficial when dealing with complex data which is multi-class and highly unbalanced.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Unbalanced multi-class"

1

Ramanan, A., S. Suppharangsan, and M. Niranjan. "Unbalanced Decision Trees for multi-class classification." In 2007 International Conference on Industrial and Information Systems. IEEE, 2007. http://dx.doi.org/10.1109/iciinfs.2007.4579190.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Baydogan, Cem, and Bilal Alatas. "Detection of Customer Satisfaction on Unbalanced and Multi-Class Data Using Machine Learning Algorithms." In 2019 1st International Informatics and Software Engineering Conference (UBMYK). IEEE, 2019. http://dx.doi.org/10.1109/ubmyk48245.2019.8965631.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Rahbar, Mohammad, Saeed Amirkhani, Ali Chaibakhsh, and Faraz Rahbar. "Unbalance fault localization in rotating machinery disks using EEMD and optimized multi-class SVM." In 2017 IEEE International Instrumentation and Measurement Technology Conference (I2MTC). IEEE, 2017. http://dx.doi.org/10.1109/i2mtc.2017.7969886.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography