Дисертації з теми "Label selection"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Label selection.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-28 дисертацій для дослідження на тему "Label selection".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Jungjit, Suwimol. "New multi-label correlation-based feature selection methods for multi-label classification and application in bioinformatics." Thesis, University of Kent, 2016. https://kar.kent.ac.uk/58873/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The very large dimensionality of real world datasets is a challenging problem for classification algorithms, since often many features are redundant or irrelevant for classification. In addition, a very large number of features leads to a high computational time for classification algorithms. Feature selection methods are used to deal with the large dimensionality of data by selecting a relevant feature subset according to an evaluation criterion. The vast majority of research on feature selection involves conventional single-label classification problems, where each instance is assigned a single class label; but there has been growing research on more complex multi-label classification problems, where each instance can be assigned multiple class labels. This thesis proposes three types of new Multi-Label Correlation-based Feature Selection (ML-CFS) methods, namely: (a) methods based on hill-climbing search, (b) methods that exploit biological knowledge (still using hill-climbing search), and (c) methods based on genetic algorithms as the search method. Firstly, we proposed three versions of ML-CFS methods based on hill climbing search. In essence, these ML-CFS versions extend the original CFS method by extending the merit function (which evaluates candidate feature subsets) to the multi-label classification scenario, as well as modifying the merit function in other ways. A conventional search strategy, hill-climbing, was used to explore the space of candidate solutions (candidate feature subsets) for those three versions of ML-CFS. These ML-CFS versions are described in detail in Chapter 4. Secondly, in order to try to improve the performance of ML-CFS in cancer-related microarray gene expression datasets, we proposed three versions of the ML-CFS method that exploit biological knowledge. These ML-CFS versions are also based on hill-climbing search, but the merit function was modified in a way that favours the selection of genes (features) involved in pre-defined cancer-related pathways, as discussed in detail in Chapter 5. Lastly, we proposed two more sophisticated versions of ML-CFS based on Genetic Algorithms (rather than hill-climbing) as the search method. The first version of GA-based ML-CFS is based on a conventional single-objective GA, where there is only one objective to be optimized; while the second version of GA-based ML-CFS performs lexicographic multi-objective optimization, where there are two objectives to be optimized, as discussed in detail in Chapter 6. In this thesis, all proposed ML-CFS methods for multi-label classification problems were evaluated by measuring the predictive accuracies obtained by two well-known multi-label classification algorithms when using the selected featuresม namely: the Multi-Label K-Nearest neighbours (ML-kNN) algorithm and the Multi-Label Back Propagation Multi-Label Learning Neural Network (BPMLL) algorithm. In general, the results obtained by the best version of the proposed ML-CFS methods, namely a GA-based ML-CFS method, were competitive with the results of other multi-label feature selection methods and baseline approaches. More precisely, one of our GA-based methods achieved the second best predictive accuracy out of all methods being compared (both with ML-kNN and BPMLL used as classifiers), but there was no statistically significant difference between that GA-based ML-CFS and the best method in terms of predictive accuracy. In addition, in the experiment with ML-kNN (the most accurate) method selects about twice as many features as our GA-based ML-CFS; whilst in the experiments with BPMLL the most accurate method was a baseline method that does not perform any feature selection, and runs the classifier once (with all original features) for each of the many class labels, which is a very computationally expensive baseline approach. In summary, one of the proposed GA-based ML-CFS methods managed to achieve substantial data reduction, (selecting a smaller subset of relevant features) without a significant decrease in predictive accuracy with respect to the most accurate method.
2

Gustafsson, Robin. "Ordering Classifier Chains using filter model feature selection techniques." Thesis, Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-14817.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Context: Multi-label classification concerns classification with multi-dimensional output. The Classifier Chain breaks the multi-label problem into multiple binary classification problems, chaining the classifiers to exploit dependencies between labels. Consequently, its performance is influenced by the chain's order. Approaches to finding advantageous chain orders have been proposed, though they are typically costly. Objectives: This study explored the use of filter model feature selection techniques to order Classifier Chains. It examined how feature selection techniques can be adapted to evaluate label dependence, how such information can be used to select a chain order and how this affects the classifier's performance and execution time. Methods: An experiment was performed to evaluate the proposed approach. The two proposed algorithms, Forward-Oriented Chain Selection (FOCS) and Backward-Oriented Chain Selection (BOCS), were tested with three different feature evaluators. 10-fold cross-validation was performed on ten benchmark datasets. Performance was measured in accuracy, 0/1 subset accuracy and Hamming loss. Execution time was measured during chain selection, classifier training and testing. Results: Both proposed algorithms led to improved accuracy and 0/1 subset accuracy (Friedman & Hochberg, p < 0.05). FOCS also improved the Hamming loss while BOCS did not. Measured effect sizes ranged from 0.20 to 1.85 percentage points. Execution time was increased by less than 3 % in most cases. Conclusions: The results showed that the proposed approach can improve the Classifier Chain's performance at a low cost. The improvements appear similar to comparable techniques in magnitude but at a lower cost. It shows that feature selection techniques can be applied to chain ordering, demonstrates the viability of the approach and establishes FOCS and BOCS as alternatives worthy of further consideration.
3

Sandrock, Trudie. "Multi-label feature selection with application to musical instrument recognition." Thesis, Stellenbosch : Stellenbosch University, 2013. http://hdl.handle.net/10019/11071.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Thesis (PhD)--Stellenbosch University, 2013.
ENGLISH ABSTRACT: An area of data mining and statistics that is currently receiving considerable attention is the field of multi-label learning. Problems in this field are concerned with scenarios where each data case can be associated with a set of labels instead of only one. In this thesis, we review the field of multi-label learning and discuss the lack of suitable benchmark data available for evaluating multi-label algorithms. We propose a technique for simulating multi-label data, which allows good control over different data characteristics and which could be useful for conducting comparative studies in the multi-label field. We also discuss the explosion in data in recent years, and highlight the need for some form of dimension reduction in order to alleviate some of the challenges presented by working with large datasets. Feature (or variable) selection is one way of achieving dimension reduction, and after a brief discussion of different feature selection techniques, we propose a new technique for feature selection in a multi-label context, based on the concept of independent probes. This technique is empirically evaluated by using simulated multi-label data and it is shown to achieve classification accuracy with a reduced set of features similar to that achieved with a full set of features. The proposed technique for feature selection is then also applied to the field of music information retrieval (MIR), specifically the problem of musical instrument recognition. An overview of the field of MIR is given, with particular emphasis on the instrument recognition problem. The particular goal of (polyphonic) musical instrument recognition is to automatically identify the instruments playing simultaneously in an audio clip, which is not a simple task. We specifically consider the case of duets – in other words, where two instruments are playing simultaneously – and approach the problem as a multi-label classification one. In our empirical study, we illustrate the complexity of musical instrument data and again show that our proposed feature selection technique is effective in identifying relevant features and thereby reducing the complexity of the dataset without negatively impacting on performance.
AFRIKAANSE OPSOMMING: ‘n Area van dataontginning en statistiek wat tans baie aandag ontvang, is die veld van multi-etiket leerteorie. Probleme in hierdie veld beskou scenarios waar elke datageval met ‘n stel etikette geassosieer kan word, instede van slegs een. In hierdie skripsie gee ons ‘n oorsig oor die veld van multi-etiket leerteorie en bespreek die gebrek aan geskikte standaard datastelle beskikbaar vir die evaluering van multi-etiket algoritmes. Ons stel ‘n tegniek vir die simulasie van multi-etiket data voor, wat goeie kontrole oor verskillende data eienskappe bied en wat nuttig kan wees om vergelykende studies in die multi-etiket veld uit te voer. Ons bespreek ook die onlangse ontploffing in data, en beklemtoon die behoefte aan ‘n vorm van dimensie reduksie om sommige van die uitdagings wat deur sulke groot datastelle gestel word die hoof te bied. Veranderlike seleksie is een manier van dimensie reduksie, en na ‘n vlugtige bespreking van verskillende veranderlike seleksie tegnieke, stel ons ‘n nuwe tegniek vir veranderlike seleksie in ‘n multi-etiket konteks voor, gebaseer op die konsep van onafhanklike soek-veranderlikes. Hierdie tegniek word empiries ge-evalueer deur die gebruik van gesimuleerde multi-etiket data en daar word gewys dat dieselfde klassifikasie akkuraatheid behaal kan word met ‘n verminderde stel veranderlikes as met die volle stel veranderlikes. Die voorgestelde tegniek vir veranderlike seleksie word ook toegepas in die veld van musiek dataontginning, spesifiek die probleem van die herkenning van musiekinstrumente. ‘n Oorsig van die musiek dataontginning veld word gegee, met spesifieke klem op die herkenning van musiekinstrumente. Die spesifieke doel van (polifoniese) musiekinstrument-herkenning is om instrumente te identifiseer wat saam in ‘n oudiosnit speel. Ons oorweeg spesifiek die geval van duette – met ander woorde, waar twee instrumente saam speel – en hanteer die probleem as ‘n multi-etiket klassifikasie een. In ons empiriese studie illustreer ons die kompleksiteit van musiekinstrumentdata en wys weereens dat ons voorgestelde veranderlike seleksie tegniek effektief daarin slaag om relevante veranderlikes te identifiseer en sodoende die kompleksiteit van die datastel te verminder sonder ‘n negatiewe impak op klassifikasie akkuraatheid.
4

Paredes, Zevallos Daniel Leoncio. "Multi-scale image inpainting with label selection based on local statistics." Master's thesis, Pontificia Universidad Católica del Perú, 2014. http://tesis.pucp.edu.pe/repositorio/handle/123456789/5578.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We proposed a novel inpainting method where we use a multi-scale approach to speed up the well-known Markov Random Field (MRF) based inpainting method. MRF based inpainting methods are slow when compared with other exemplar-based methods, because its computational complexity is O(jLj2) (L feasible solutions’ labels). Our multi-scale approach seeks to reduces the number of the L (feasible) labels by an appropiate selection of the labels using the information of the previous (low resolution) scale. For the initial label selection we use local statistics; moreover, to compensate the loss of information in low resolution levels we use features related to the original image gradient. Our computational results show that our approach is competitive, in terms reconstruction quality, when compare to the original MRF based inpainting, as well as other exemplarbased inpaiting algorithms, while being at least one order of magnitude faster than the original MRF based inpainting and competitive with exemplar-based inpaiting.
Tesis
5

Duncan, Alyssa Renee. ""Nutrition facts" label use in the selection of healthier foods by undergraduate students." FIU Digital Commons, 1996. http://digitalcommons.fiu.edu/etd/3239.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Use of "Nutrition Facts" panel on food labels was studied in the selection of healthier substitutes for foods normally consumed by 276 undergraduates, mean age, 19.7+2.5 years. Among 1095 label pairs (3.97 per student), 80.6% included a "healthier" substitute. Most common food categories were cookies/bars/tarts (12.8%), cereal (11.8%), chips/crackers (11.1%), beverages (10.2%) and breads/muffins (9.1%). Up to three errors were recorded per label pair, with 384 total errors made, including failure to adjust for serving size (34%), use of pre-NLEA labels (30%), comparison of unlike foods (16%) and unclear comparisons or missing labels (19%). Among 3295 nutrient comparisons, total fat (23.6%), calories (18.4%) and sodium (11.7%) were cited most often. Substitutes were a little (1-10% difference) to a lot healthier (>51% difference) for 83% of nutrients. Sixty percent would purchase healthier foods again or look for other substitutes and 47% stated they preferred the substitute's taste or thought it equivalent.
6

Gonzalez, Lopez Jorge. "Distributed multi-label learning on Apache Spark." VCU Scholars Compass, 2019. https://scholarscompass.vcu.edu/etd/5775.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This thesis proposes a series of multi-label learning algorithms for classification and feature selection implemented on the Apache Spark distributed computing model. Five approaches for determining the optimal architecture to speed up multi-label learning methods are presented. These approaches range from local parallelization using threads to distributed computing using independent or shared memory spaces. It is shown that the optimal approach performs hundreds of times faster than the baseline method. Three distributed multi-label k nearest neighbors methods built on top of the Spark architecture are proposed: an exact iterative method that computes pair-wise distances, an approximate tree-based method that indexes the instances across multiple nodes, and an approximate local sensitive hashing method that builds multiple hash tables to index the data. The results indicated that the predictions of the tree-based method are on par with those of an exact method while reducing the execution times in all the scenarios. The aforementioned method is then used to evaluate the quality of a selected feature subset. The optimal adaptation for a multi-label feature selection criterion is discussed and two distributed feature selection methods for multi-label problems are proposed: a method that selects the feature subset that maximizes the Euclidean norm of individual information measures, and a method that selects the subset of features maximizing the geometric mean. The results indicate that each method excels in different scenarios depending on type of features and the number of labels. Rigorous experimental studies and statistical analyses over many multi-label metrics and datasets confirm that the proposals achieve better performances and provide better scalability to bigger data than the methods compared in the state of the art.
7

Lu, Tien-hsin. "SqueezeFit Linear Program: Fast and Robust Label-aware Dimensionality Reduction." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587156777565173.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Gharroudi, Ouadie. "Ensemble multi-label learning in supervised and semi-supervised settings." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSE1333/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
L'apprentissage multi-label est un problème d'apprentissage supervisé où chaque instance peut être associée à plusieurs labels cibles simultanément. Il est omniprésent dans l'apprentissage automatique et apparaît naturellement dans de nombreuses applications du monde réel telles que la classification de documents, l'étiquetage automatique de musique et l'annotation d'images. Nous discutons d'abord pourquoi les algorithmes multi-label de l'etat-de-l'art utilisant un comité de modèle souffrent de certains inconvénients pratiques. Nous proposons ensuite une nouvelle stratégie pour construire et agréger les modèles ensemblistes multi-label basés sur k-labels. Nous analysons ensuite en profondeur l'effet de l'étape d'agrégation au sein des approches ensemblistes multi-label et étudions comment cette agrégation influece les performances de prédictive du modèle enfocntion de la nature de fonction cout à optimiser. Nous abordons ensuite le problème spécifique de la selection de variables dans le contexte multi-label en se basant sur le paradigme ensembliste. Trois méthodes de sélection de caractéristiques multi-label basées sur le paradigme des forêts aléatoires sont proposées. Ces méthodes diffèrent dans la façon dont elles considèrent la dépendance entre les labels dans le processus de sélection des varibales. Enfin, nous étendons les problèmes de classification et de sélection de variables au cadre d'apprentissage semi-supervisé. Nous proposons une nouvelle approche de sélection de variables multi-label semi-supervisée basée sur le paradigme de l'ensemble. Le modèle proposé associe des principes issues de la co-training en conjonction avec une métrique interne d'évaluation d'importnance des varaibles basée sur les out-of-bag. Testés de manière satisfaisante sur plusieurs données de référence, les approches développées dans cette thèse sont prometteuses pour une variété d'ap-plications dans l'apprentissage multi-label supervisé et semi-supervisé. Testés de manière satisfaisante sur plusieurs jeux de données de référence, les approches développées dans cette thèse affichent des résultats prometteurs pour une variété domaine d'applications de l'apprentissage multi-label supervisé et semi-supervisé
Multi-label learning is a specific supervised learning problem where each instance can be associated with multiple target labels simultaneously. Multi-label learning is ubiquitous in machine learning and arises naturally in many real-world applications such as document classification, automatic music tagging and image annotation. In this thesis, we formulate the multi-label learning as an ensemble learning problem in order to provide satisfactory solutions for both the multi-label classification and the feature selection tasks, while being consistent with respect to any type of objective loss function. We first discuss why the state-of-the art single multi-label algorithms using an effective committee of multi-label models suffer from certain practical drawbacks. We then propose a novel strategy to build and aggregate k-labelsets based committee in the context of ensemble multi-label classification. We then analyze the effect of the aggregation step within ensemble multi-label approaches in depth and investigate how this aggregation impacts the prediction performances with respect to the objective multi-label loss metric. We then address the specific problem of identifying relevant subsets of features - among potentially irrelevant and redundant features - in the multi-label context based on the ensemble paradigm. Three wrapper multi-label feature selection methods based on the Random Forest paradigm are proposed. These methods differ in the way they consider label dependence within the feature selection process. Finally, we extend the multi-label classification and feature selection problems to the semi-supervised setting and consider the situation where only few labelled instances are available. We propose a new semi-supervised multi-label feature selection approach based on the ensemble paradigm. The proposed model combines ideas from co-training and multi-label k-labelsets committee construction in tandem with an inner out-of-bag label feature importance evaluation. Satisfactorily tested on several benchmark data, the approaches developed in this thesis show promise for a variety of applications in supervised and semi-supervised multi-label learning
9

Narassiguin, Anil. "Apprentissage Ensembliste, Étude comparative et Améliorations via Sélection Dynamique." Thesis, Lyon, 2018. http://www.theses.fr/2018LYSE1075/document.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Les méthodes ensemblistes constituent un sujet de recherche très populaire au cours de la dernière décennie. Leur succès découle en grande partie de leurs solutions attrayantes pour résoudre différents problèmes d'apprentissage intéressants parmi lesquels l'amélioration de l'exactitude d'une prédiction, la sélection de variables, l'apprentissage de métrique, le passage à l'échelle d'algorithmes inductifs, l'apprentissage de multiples jeux de données physiques distribués, l'apprentissage de flux de données soumis à une dérive conceptuelle, etc... Dans cette thèse nous allons dans un premier temps présenter une comparaison empirique approfondie de 19 algorithmes ensemblistes d'apprentissage supervisé proposé dans la littérature sur différents jeux de données de référence. Non seulement nous allons comparer leurs performances selon des métriques standards de performances (Exactitude, AUC, RMS) mais également nous analyserons leur diagrammes kappa-erreur, la calibration et les propriétés biais-variance. Nous allons aborder ensuite la problématique d'amélioration des ensembles de modèles par la sélection dynamique d'ensembles (dynamic ensemble selection, DES). La sélection dynamique est un sous-domaine de l'apprentissage ensembliste où pour une donnée d'entrée x, le meilleur sous-ensemble en terme de taux de réussite est sélectionné dynamiquement. L'idée derrière les approches DES est que différents modèles ont différentes zones de compétence dans l'espace des instances. La plupart des méthodes proposées estime l'importance individuelle de chaque classifieur faible au sein d'une zone de compétence habituellement déterminée par les plus proches voisins dans un espace euclidien. Nous proposons et étudions dans cette thèse deux nouvelles approches DES. La première nommée ST-DES est conçue pour les ensembles de modèles à base d'arbres de décision. Cette méthode sélectionne via une métrique supervisée interne à l'arbre, idée motivée par le problème de la malédiction de la dimensionnalité : pour les jeux de données avec un grand nombre de variables, les métriques usuelles telle la distance euclidienne sont moins pertinentes. La seconde approche, PCC-DES, formule la problématique DES en une tâche d'apprentissage multi-label avec une fonction coût spécifique. Ici chaque label correspond à un classifieur et une base multi-label d'entraînement est constituée sur l'habilité de chaque classifieur de classer chaque instance du jeu de données d'origine. Cela nous permet d'exploiter des récentes avancées dans le domaine de l'apprentissage multi-label. PCC-DES peut être utilisé pour les approches ensemblistes homogènes et également hétérogènes. Son avantage est de prendre en compte explicitement les corrélations entre les prédictions des classifieurs. Ces algorithmes sont testés sur un éventail de jeux de données de référence et les résultats démontrent leur efficacité faces aux dernières alternatives de l'état de l'art
Ensemble methods has been a very popular research topic during the last decade. Their success arises largely from the fact that they offer an appealing solution to several interesting learning problems, such as improving prediction accuracy, feature selection, metric learning, scaling inductive algorithms to large databases, learning from multiple physically distributed data sets, learning from concept-drifting data streams etc. In this thesis, we first present an extensive empirical comparison between nineteen prototypical supervised ensemble learning algorithms, that have been proposed in the literature, on various benchmark data sets. We not only compare their performance in terms of standard performance metrics (Accuracy, AUC, RMS) but we also analyze their kappa-error diagrams, calibration and bias-variance properties. We then address the problem of improving the performances of ensemble learning approaches with dynamic ensemble selection (DES). Dynamic pruning is the problem of finding given an input x, a subset of models among the ensemble that achieves the best possible prediction accuracy. The idea behind DES approaches is that different models have different areas of expertise in the instance space. Most methods proposed for this purpose estimate the individual relevance of the base classifiers within a local region of competence usually given by the nearest neighbours in the euclidean space. We propose and discuss two novel DES approaches. The first, called ST-DES, is designed for decision tree based ensemble models. This method prunes the trees using an internal supervised tree-based metric; it is motivated by the fact that in high dimensional data sets, usual metrics like euclidean distance suffer from the curse of dimensionality. The second approach, called PCC-DES, formulates the DES problem as a multi-label learning task with a specific loss function. Labels correspond to the base classifiers and multi-label training examples are formed based on the ability of each classifier to correctly classify each original training example. This allows us to take advantage of recent advances in the area of multi-label learning. PCC-DES works on homogeneous and heterogeneous ensembles as well. Its advantage is to explicitly capture the dependencies between the classifiers predictions. These algorithms are tested on a variety of benchmark data sets and the results demonstrate their effectiveness against competitive state-of-the-art alternatives
10

Kraus, Vivien. "Apprentissage semi-supervisé pour la régression multi-labels : application à l’annotation automatique de pneumatiques." Thesis, Lyon, 2021. https://tel.archives-ouvertes.fr/tel-03789608.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Avec l’avènement et le développement rapide des technologies numériques, les données sont devenues à la fois un bien précieux et très abondant. Cependant, avec une telle profusion, se posent des questions relatives à la qualité et l’étiquetage de ces données. En effet, à cause de l’augmentation des volumes de données disponibles, alors que le coût de l’étiquetage par des experts humains reste très important, il est de plus en plus nécessaire de pouvoir renforcer l’apprentissage semi-supervisé grâce l’exploitation des données nonlabellisées. Ce problème est d’autant plus marqué dans le cas de l’apprentissage multilabels, et en particulier pour la régression, où chaque unité statistique est guidée par plusieurs cibles différentes, qui prennent la forme de scores numériques. C’est dans ce cadre fondamental, que s’inscrit cette thèse. Tout d’abord, nous commençons par proposer une méthode d’apprentissage pour la régression semi-supervisée, que nous mettons à l’épreuve à travers une étude expérimentale détaillée. Grâce à cette nouvelle méthode, nous présentons une deuxième contribution, plus adaptée au contexte multi-labels. Nous montrons également son efficacité par une étude comparative, sur des jeux de données issues de la littérature. Par ailleurs, la dimensionnalité du problème demeure toujours la difficulté de l’apprentissage automatique, et sa réduction suscite l’intérêt de plusieurs chercheurs dans la communauté. Une des tâches majeures répondant à cette problématique est la sélection de variables, que nous proposons d’étudier ici dans un cadre complexe : semi-supervisé, multi-labels et pour la régression
With the advent and rapid growth of digital technologies, data has become a precious asset as well as plentiful. However, with such an abundance come issues about data quality and labelling. Because of growing numbers of available data volumes, while human expert labelling is still important, it is more and more necessary to reinforce semi-supervised learning with the exploitation of unlabeled data. This problem is all the more noticeable in the multi-label learning framework, and in particular for regression, where each statistical unit is guided by many different targets, taking the form of numerical scores. This thesis focuses on this fundamental framework. First, we begin by proposing a method for semi-supervised regression, that we challenge through a detailed experimental study. Thanks to this new method, we present a second contribution, more fitted to the multi-label framework. We also show its efficiency with a comparative study on literature data sets. Furthermore, the problem dimension is always a pain point of machine learning, and reducing it sparks the interest of many researchers. Feature selection is one of the major tasks addressing this problem, and we propose to study it here in a complex framework : for semi-supervised, multi-label regression. Finally, an experimental validation is proposed on a real problem about automatic annotation of tires, to tackle the needs expressed by the industrial partner of this thesis
11

Spolaôr, Newton. "Seleção de atributos para aprendizagem multirrótulo." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-25032015-160505/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
A presença de atributos não importantes, i.e., atributos irrelevantes ou redundantes nos dados, pode prejudicar o desempenho de classificadores gerados a partir desses dados por algoritmos de aprendizado de máquina. O objetivo de algoritmos de seleção de atributos consiste em identificar esses atributos não importantes para removê-los dos dados antes da construção de classificadores. A seleção de atributos em dados monorrótulo, nos quais cada exemplo do conjunto de treinamento é associado com somente um rótulo, tem sido amplamente estudada na literatura. Entretanto, esse não é o caso para dados multirrótulo, nos quais cada exemplo é associado com um conjunto de rótulos (multirrótulos). Além disso, como esse tipo de dados usualmente apresenta relações entre os rótulos do multirrótulo, algoritmos de aprendizado de máquina deveriam considerar essas relações. De modo similar, a dependência de rótulos deveria também ser explorada por algoritmos de seleção de atributos multirrótulos. A abordagem filtro é uma das mais utilizadas por algoritmos de seleção de atributos, pois ela apresenta um custo computacional potencialmente menor que outras abordagens e utiliza características gerais dos dados para calcular as medidas de importância de atributos. tais como correlação de atributo-classe, entre outras. A hipótese deste trabalho é trabalho é que algoritmos de seleção de atributos em dados multirrótulo que consideram a dependência de rótulos terão um melhor desempenho que aqueles que ignoram essa informação. Para tanto, é proposto como objetivo deste trabalho o projeto e a implementação de algoritmos filtro de seleção de atributos multirrótulo que consideram relações entre rótulos. Em particular, foram propostos dois métodos que levam em conta essas relações por meio da construção de rótulos e da adaptação inovadora do algoritmo de seleção de atributos monorrótulo ReliefF. Esses métodos foram avaliados experimentalmente e apresentam bom desempenho em termos de redução no número de atributos e qualidade dos classificadores construídos usando os atributos selecionados.
Irrelevant and/or redundant features in data can deteriorate the performance of the classifiers built from this data by machine learning algorithms. The aim of feature selection algorithms consists in identifying these features and removing them from data before constructing classifiers. Feature selection in single-label data, in which each instance in the training set is associated with only one label, has been widely studied in the literature. However, this is not the case for multi-label data, in which each instance is associated with a set of labels. Moreover, as multi-label data usually exhibit relationships among the labels in the set of labels, machine learning algorithms should take thiis relatinship into account. Therefore, label dependence should also be explored by multi-label feature selection algorithms. The filter approach is one of the most usual approaches considered by feature selection algorithms, as it has potentially lower computational cost than approaches and uses general properties from data to calculate feature importance measures, such as the feature-class correlation. The hypothesis of this work is that feature selection algorithms which consider label dependence will perform better than the ones that disregard label dependence. To this end, ths work proposes and develops filter approach multi-label feature selection algorithms which take into account relations among labels. In particular, we proposed two methods that take into account these relations by performing label construction and adapting the single-label feature selection algorith RelieF. These methods were experimentally evaluated showing good performance in terms of feature reduction and predictability of the classifiers built using the selected features.
12

Tan, Run Yan. "Active Learning using a Sample Selector Network." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-287312.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In this work, we set the stage of a limited labelling budget and propose using a sample selector network to learn and select effective training samples, whose labels we would then acquire to train the target model performing the required machine learning task. We make the assumption that the sample features, the state of the target model and the training loss of the target model are informative for training the sample selector network. In addition, we approximate the state of the target model with its intermediate and final network outputs. We investigate if under a limited labelling budget, the sample selector network is capable of learning and selecting training samples that train the target model at least as effectively as using another training subset of the same size that is uniformly randomly sampled from the full training dataset, the latter being the common procedure used to train machine learning models without active learning. We refer to this common procedure as the traditional machine learning uniform random sampling method. We perform experiments on the MNIST and CIFAR-10 datasets; and demonstrate with empirical evidence that under a constrained labelling budget and some other conditions, active learning using a sample selector network enables the target model to learn more effectively.
I detta arbete sätter vi steget i en begränsad märkningsbudget och föreslår att vi använder ett provväljarnätverk för att lära och välja effektiva träningsprover, vars etiketter vi sedan skulle skaffa för att träna målmodellen som utför den nödvändiga maskininlärningsuppgiften. Vi antar att provfunktionerna, tillståndet för målmodellen och utbildningsförlusten för målmodellen är informativa för att träna provväljarnätverket. Dessutom uppskattar vi målmodellens tillstånd med dess mellanliggande och slutliga nätverksutgångar. Vi undersöker om provväljarnätverket enligt en begränsad märkningsbudget kan lära sig och välja utbildningsprover som tränar målmodellen minst lika effektivt som att använda en annan träningsdel av samma storlek som är enhetligt slumpmässigt samplad från hela utbildningsdatasystemet, det senare är det vanliga förfarandet som används för att utbilda maskininlärningsmodeller utan aktivt lärande. Vi hänvisar till denna vanliga procedur som den traditionella maskininlärning enhetliga slumpmässig sampling metod. Vi utför experiment på datasätten MNIST och CIFAR-10; och visa med empiriska bevis att under en begränsad märkningsbudget och vissa andra förhållanden, aktivt lärande med hjälp av ett provvalnätverk gör det möjligt för målmodellen att lära sig mer effektivt.
13

Ögren, Niklas. "Selecting/realization of Virtual Private Networks with Multiprotocol Label Switching or Virtual Local Area Networks." Thesis, KTH, Mikroelektronik och Informationsteknik, IMIT, 2002. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-93211.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Many reports have been written about the techniques behind Virtual Private Networks (VPN) and Multi Protocol Label Switching (MPLS). They usually deal with the low level design of the software implementing a specific technique. The initial products are usually not mature enough to run in a large network or have to be adjusted in some way to fit. This report investigates the different ways of implementing strict layer 2 Virtual Private Networks in an existing nation-wide Gigabit Ethernet. The infrastructure in use, as well as the hardware, has to be used without major changes. Since 1998/1999, when MPLS first started in the laboratories, development has continued. Today it is possible to introduce MPLS or tunneled national virtual local area network into an existing network. This requires high speed, fault tolerant, and stable hardware and software. Going beyond the separation of traffic at layer 3 using Virtual Private Networks, i.e., IPSec, we can tunnel layer 2 traffic through a network. Although the first layer 3 VPN products are already in use, layer 2 VPNs still need to be evaluated and brought into regular use. There are currently two ways of tunneling VLANs in a core network: tunneled VLANs (or as Extreme Networks calls them, VMANs) and MPLS. This project showed that it is possible to start with a VLAN-only solution, and then upgrade to MPLS to solve scalability issues. The VMAN solution can not be used at Arrowhead, since there are too many disadvantages in the way Extreme Networks has implemented it. However, a mix of tunneling VMAN in a VLAN core is possible, and enables customer tagging of VLANs in a Layer 2 VPN. Furthermore, the testing of EAPS and per-VLAN Spanning Tree Protocol turned out well, and showed that EAPS should not be used when there is more than one loop.
14

Tolbert, Thomas J. (Thomas James) 1969. "Synthesis of RNA with selective isotopic labels for NMR structural studies." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/50341.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
15

Tomeš, Jan. "Analýza přesnosti výroby lamel formy pneumatiky vyráběných SLM technologií." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2016. http://www.nusl.cz/ntk/nusl-241858.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The first part of the diploma thesis is focused on the analysis and evaluation of the current production of sipes by two SLM devices PXL and M2 Cusing, produced by Phenix Systems and Concept Laser companies. The samples of both machines went through the same manufacturing process and the same process of measurement and evaluation, in order to carry out comparison between individual machines. Geometric accuracy, surface roughness, mechanical properties, and material structure of the samples have been compared. For the sipes it was necessary to create a digital evaluation methodology of geometry. In the second part of the thesis, process parameters are selected on the basis of research and further their influence on surface roughness of manufactured sipes is analyzed.
16

Norris, Maria. "Contesting identity and preventing belonging? : an analysis of British counter terrorism policy since the Terrorism Act 2000 and the selective use of the terrorism label by the British Government." Thesis, London School of Economics and Political Science (University of London), 2015. http://etheses.lse.ac.uk/3348/.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In 2013, Lee Rigby was murdered in Woolwich. In retaliation, there were several attacks on the Muslim community. Both series of events fall under the Terrorism Act 2000 legal definition of terrorism. Nonetheless, only Rigby's murder was treated as an act of terror by the government. This begs the question, as terrorism is defined in a broad and neutral way legally, what explains the selective use of the label of terrorism by the UK government? Answering this question begins by looking at terrorism from the perspective of Critical Terrorism Studies, approaching the label of terrorism as an act of securitization. As such, the thesis goes beyond the legal definition of terrorism, seeking to unearth the official policy narrative of terrorism on the UK. In order to do this, it analyses the three versions of Contest: The United Kingdom’s Strategy for Countering Terrorism, the government’s official terrorism policy papers. The analysis reveals an official policy narrative of terrorism which securitizes Islam, Muslims and Muslim identity, by constructing a causal story that places ideology and identity at the heart of the explanation for terrorism. Moreover, the concern with identity gives the narrative a strong nationalist characteristic. This is further deconstructed using the boundary-security nexus. The boundarysecurity nexus incorporates boundary and nationalism theory into securitization, which better helps to understand and explain how discursive constructions of security and identity work in a dialectic relationship. Once the nexus is introduced, it becomes clear how the selective use of the terrorism label by the government may not just further securitize Islam and the Muslim Community, but also act as a way of protecting and reinforcing the bounded community of the nation state.
17

Tsai, Yue-Yang, and 蔡岳洋. "Semi-supervised Feature Selection Using Soft-label Information." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/51541470430761590429.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
碩士
國立交通大學
資訊科學與工程研究所
100
Feature selection is an important task in machine learning. Practically, the quality of features affect the result of machine learning algorithms. In supervised feature selection, sufficient labeled data is necessary. However, labeling, a time-consuming process, is typically done manually. Conversely, unlabeled data is relatively easy to collect. Although unsupervised feature selection does not require labeled data, additional prior information should be considered when labeled data is available. Therefore, this paper proposes a semi-supervised feature selection algorithm to consider both labeled and unlabeled data. This proposed semi-supervised feature selection algorithm is called Soft-label semi-supervised feature selection algorithm. This algorithm applies Semi-supervised logistic regression algorithm to obtain soft-label information of unlabeled data, and applies proposed soft-label mutual information formula to combine label information and soft-label information to find the best feature subset. In the experimental section, we conduct experiments on several datasets, and experimental results indicate that the proposed algorithm can effectively improve classification performance.
18

Posinasetty, Anusha. "Multi-label Classification with Multiple Label Correlation Orders And Structures." Thesis, 2016. http://etd.iisc.ernet.in/2005/3719.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Multilabel classification has attracted much interest in recent times due to the wide applicability of the problem and the challenges involved in learning a classifier for multilabeled data. A crucial aspect of multilabel classification is to discover the structure and order of correlations among labels and their effect on the quality of the classifier. In this work, we propose a structural Support Vector Machine (structural SVM) based framework which enables us to systematically investigate the importance of label correlations in multi-label classification. The proposed framework is very flexible and provides a unified approach to handle multiple correlation orders and structures in an adaptive manner and helps to effectively assess the importance of label correlations in improving the generalization performance. We perform extensive empirical evaluation on several datasets from different domains and present results on various performance metrics. Our experiments provide for the first time, interesting insights into the following questions: a) Are label correlations always beneficial in multilabel classification? b) What effect do label correlations have on multiple performance metrics typically used in multilabel classification? c) Is label correlation order significant and if so, what would be the favorable correlation order for a given dataset and a given performance metric? and d) Can we make useful suggestions on the label correlation structure?
19

Ma, Long. "A Multi-label Text Classification Framework: Using Supervised and Unsupervised Feature Selection Strategy." 2017. http://scholarworks.gsu.edu/cs_diss/123.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Text classification, the task of metadata to documents, requires significant time and effort when performed by humans. Moreover, with online-generated content explosively growing, it becomes a challenge for manually annotating with large scale and unstructured data. Currently, lots of state-or-art text mining methods have been applied to classification process, many of them based on the key word extraction. However, when using these key words as features in classification task, it is common that feature dimension is huge. In addition, how to select key words from tons of documents as features in classification task is also a challenge. Especially when using tradition machine learning algorithm in the large data set, the computation cost would be high. In addition, almost 80% of real data is unstructured and non-labeled. The advanced supervised feature selection methods cannot be used directly in selecting entities from massive of data. Usually, extracting features from unlabeled data for classification tasks, statistical strategies have been utilized to discover key features. However, we propose a nova method to extract important features effectively before feeding them into the classification assignment. There is another challenge in the text classification is the multi-label problem, the assignment of multiple non-exclusive labels to the documents. This problem makes text classification more complicated when compared with single label classification. Considering above issues, we develop a framework for extracting and eliminating data dimensionality, solving the multi-label problem on labeled and unlabeled data set. To reduce data dimension, we provide 1) a hybrid feature selection method that extracts meaningful features according to the importance of each feature. 2) we apply the Word2Vec to represent each document with a lower feature dimension when doing the document categorization for the big data set. 3) An unsupervised approach to extract features from real online generated data for text classification and prediction. On the other hand, to solve the multi-label classification task, we design a new Multi-Instance Multi-Label (MIML) algorithm in the proposed framework.
20

Chang, Jen Fu, and 張仁輔. "Label-free Selection of Liver Cancer Stem Cells by Using Polyelectrolyte Multilayer Films." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/50250671387046536855.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
碩士
長庚大學
生化與生醫工程研究所
101
The majority of hepatocellular carcinoma (HCC) patients present with an advanced stage for which chemotherapy and radiotherapy have limited efficacy. Early diagnosis and treatment of HCC remain challenging due to lack of highly specific and sensitive markers. Cancer stem cells (CSCs) were the source of many solid tumor types, including HCC. Currently, the latest cancer research was oriented toward cancer stem cells, to develop a new cancer therapy direction. The surface properties of materials could be regulated by layer-by-layer polyelectrolyte multilayer (PEM) films and to affect the cell behaviors. Therefore, by varying of polyelectrolyte materials and layer numbers, a series of microenvironment was establish, to control cell morphology and cell attachment. Previously, it was demonstrated that PEM films enable to select liver stem/progenitor cells. Thus, it is suggested that CSC colonies could be selected by the microenvironment controlled by PEM architecture. The aim of this study was to use glass as the based substrate, and to sequentially deposit positively charged (poly(allylamine hydrochloride ), PAH ) and negatively charged ( poly(sodium 4-styrene sulfonate ), PSS ) by using layer-by-layer technique to fabricate the series of PEM films. Liver cancer cell line (Huh7) were cultured on the series of PEM films, in order to establish a label free system for selection of liver cancer stem cells. Quartz crystal microbalance with dissipation sensor was used to investigate the oscillatory frequency and to analyze the dissipation of PEM films. Cell behaviors and colony formation were observed by using a optical microscope. In addition, the techniques for the cell cytotoxicity, CSCs marker expression including double staining of CD133/CD44 and CD133/EpCAM, drug sensitivity, and cell cycle determinant were used to analyze isolated cells from different PEM films. The results showed that the aggregation of cells were investigated obviously in which culture on (PAH/PSS)4-PAH and (PAH/PSS)6-PAH substrates. In addition, by using flow cytometry to determine the CSCs marker expression, the selected cells from the substrates of (PAH/PSS)4-PAH and (PAH/PSS)6-PAH displayed a high degree of CD133/CD44 expression and the percentage increased with the culture period. Furthermore, a commonly used anti-cancer drug, doxorubicin, was used to measure drug sensitivity of the selected cells on different substrates. It was demonstrated that cells selected from (PAH/PSS)4-PAH and (PAH/PSS)6-PAH displayed the low dependence for drug with the drug concentration. However, cell cycle assay revealed that most of the selected cells were arrested in the S-phase, suggesting the proliferation of the liver CSC. In conclusion, a series of microenvironments was constructed by PEM films which can select and purify CSCs. Besides, the surface characteristics of these PEM films were accounted for the relation between microenvironment and liver cancer stem cell. This system could be used for the drug screening and may provide the new strategy on developing the liver cancer therapy.
21

Tsai, Fu-Kun, and 蔡富焜. "The Study of Configure flash off label Equipment Selection Assessment-Taking a Large Department Stores as an Example." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/skz8v4.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
碩士
中華科技大學
土木防災與管理碩士班
102
The topic of finding refuge is the most important work of a large department stores in the safety plan of fire protection. When fire disaster occurs in a large department stores, both user in how to use label equipment or avoid disaster information provided in the structural interior and user is to be accepted the information of finding refuge are the most important study topic in this article. The principal objective is to do a case study used a large department stores. Expert visiting, questionnaire, and analytical hierarchy process (AHP) are used to be inducted the configuration flash off label equipment and the paying attention to items in a large department stores. The present study results indicate that one is using flash and sound to offer the information of finding refuge. Another is to help the weak and the timid to leave the fire place in the most short time through the path of finding refuge. Furthermore, based on the results of the AHP to the data of questionnaire, the order of selection alternative of flash off label equipment is functionality. distinguishability, quality, construction method and cost. Based on the consideration of inducting finding refuge, the rank of adopting flash off label is R-type and P-type. Results of the present study may be provided the reference of selection for the flash off label equipment in the future.
22

Losik, Tatiana. "Your inner garden: children´s book project on the artificial selection of labels." Master's thesis, 2021. http://hdl.handle.net/10400.26/38774.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Este projeto de mestrado visa contribuir para um esforço global de mudança de certos padrões de pensamento, na forma de um objeto de cultura visual que estimule o interesse das crianças e as consciencialize de que todas as ideias que acreditamos serem verdadeiras sobre nós mesmos, podem ser desafiadas. Ao longo da vida, as pessoas colecionam rótulos, e esses rótulos influenciam o modo como uma pessoa pensa sobre si mesma, e se comporta, com base nesse conhecimento. Contudo, a perspectiva que alguém pode ter sobre o outro, nada tem que ver com a sua real identidade. Isto deve-se ao facto de que as ideias de outras pessoas só existem enquanto percepção individual da realidade, que se devem à sua forma de ver o mundo, bem como a outros fatores. É possível repensar rótulos negativos que nos foram atribuídos no passado; parar de os reproduzir nas nossas mentes; ter pensamentos positivos, com gentileza e apreço por nós mesmos e pelos outros. O projeto é, assim, um livro ilustrado que demonstra interativamente o conteúdo baseado em pesquisa, de forma a que uma criança possa entender. Por fazermos parte de uma realidade cada vez mais digital, dentro do livro, é possível ter acesso a uma aplicação móvel, onde o leitor pode colecionar os seus feitos, mesmo os menores, para ir construindo uma autoestima positiva. A escolha do conteúdo a ser apresentado no livro e a forma como ele é tratado partem dessa questão premente, que está presente no dia-a-dia de grande parte das pessoas, sendo a faixa etária a que se destina a dos 7 ou mais anos de idade, o grupo etário considerado mais adequado. Após terminar o livro "Inner Garden", a minha colega Joana Sofia dos Santos Guerreiro utilizou-o para a sua dissertação de mestrado e apresentou-o a crianças do ensino básico português para saber as reacções e opiniões das crianças sobre o livro.
This master’s project aims to contribute to a global effort to change certain patterns of thinking, with an object of visual culture that is interesting for children and makes them aware that all the ideas about ourselves that we believe to be true can be challenged. Throughout life, people collect labels, and those labels influence how a person thinks about themselves and behaves based on that knowledge. But the way a person sees another person has nothing to do with who that person actually is. Because other people's ideas only exist in their perception of reality, based on their way of seeing the world and other factors. It is possible to rethink negative labels that have been attached to us in the past; to stop reproducing them in our minds; to think positive thoughts with kindness and appreciation for ourselves and others. The project is an illustrated picture book that interactively demonstrates research-based content in a way that a child can understand. Since we are part of an increasingly digital reality, you can access a mobile app inside the book where you can collect your accomplishments, even the smallest ones, to build positive self-esteem. The choice of content to be presented in the book and the way it is treated came from this pressing issue that is present in the daily lives of most people, with the age group of 7 years and older being the appropriate group. When I finished the book "Inner Garden,'' my colleague Joana Sofia dos Santos Guerreiro used it for her Master's thesis and presented it to children in the Portuguese primary school system to find out the children's reactions and opinions about the book.
23

WANG, XIAO-DONG, and 王曉棟. "Robust and Fast Feature Selection Methods for High-dimensional Data with Limited Labels." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/6ques9.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
博士
朝陽科技大學
資訊管理系
107
Feature selection is one of the most representative techniques in the area of pattern recognition, which aims at filtering data attributes, removing redundant features and improving the performance of the follow-up classification or clustering tasks. In recent years, with increasingly powerful multimedia and computer technologies, high-dimensional data have been rapidly generated. As it is extremely expensive to collect sufficient labels for such a large amount of data, a growing number of data with few labels are presented, which presents a great challenge to existing feature selection methods. Therefore, how to design reasonable and effective feature selection model becomes more and more important for data with limited label information. Under such a circumstance, to meet the requirements of different scale of data with limited labels, this dissertation designs several semi-supervised and unsupervised feature selection algorithms and combines them with various kinds of applications, such as multi-label learning, multi-task learning, and clustering. Firstly, to handle small-scale high-dimensional data, we propose a semi-supervised feature learning model, where the Laplacian matrix construction is constraint by the -norm and is robust to outliers by removing redundant connects among nodes. Secondly, a semi-supervised feature selection model based on multi-task learning is proposed for the large-scale data. Such a model is independent on the graph construction and is able to explore the shared information among tasks by a low-rank regularization. Transferring the relevant information among tasks, it can properly preserve the most important features. Finally, for the large-scale high-dimensional data without labels, we propose a flexible objective function to adaptively perform feature learning with clustering, which is suitable for data with different kinds of distributions. Experimental results show that the proposed three models can efficiently select the most representative features with high accuracy over other classic algorithms in the limited-label scenario. What is more, the proposed models are general and can be extended to other applications.
24

Tiun, Ting-Kng, and 張呈光. "A Sequeacial Feature Selecting Strategy Based on Relevance Between Data Label and Principle Component." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/bw8d5v.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
碩士
國立臺灣科技大學
資訊管理系
105
Binary classification method predicts the class of an object based on the associated feature vector. Traditional classification methods usually suffer from the high dimensionality of the feature vector, resulting in the need for decreasing feature vectors. There exist two major approaches to reducing the number of features. One is to select a subset of indigenous features which maintains the original meaning of each feature. The relevance among original features makes it difficult to find a proper subset of significant features from a large number of features, resorting to the need for random optimization algorithms. Another approach first transforms the original attributes to uncorrelated integrated features by the principal component analysis (PCA) and then sequentially search for the subset of significant integrated features. The second approach removes the relevance among integrated features, making the sequential search for the subset of significant integrated features feasible, while losing the interpret ability of significant features. In this study, we first transform the original features to uncorrelated integrated features by PCA and then rank the integrated features according to associated variances. To find the subset of significant integrated features, starting with the integrated features according to the corresponding ranks. For each subset of integrated features, a test score which is a linear combination of the integrated features is generated for classification. The coefficient on each integrated feature in the linear combination is determined such that the area under the Receiver Operating Characteristic(ROC) cure corresponding to the test score is maximized using the Genetic Algorithm(GA). Beside the self-developed classifier, we applied two other commonly used classifiers for comparison. Using the training data, the classification accuracy for each subset is evaluated and the subset with the largest classification accuracy is the final subset of significant integrated features used for classification. In addition to ranking the integrated features by the corresponding variances, we can also rank the integrated features by the corresponding Fisher Information, $R^2$ and AUC and then sequentially inflate the subset of integrated features according to the resulting ranks. Experimental results show that using Fisher Information has chances to get a better subset than merely PCA with variance. However, using PCA has a much consistant result. Using PCA can preduce a more consistance performance and more economy for calculating power. We assume that there are more to investigate further for the situation of using Fisher Information or other correlation methods as selection measurement to get a better classification performance than PCA variance.
25

Bouhlal, Yasser. "A Retrospective and Prospective Analysis of the Demand for Cheese Varieties in the United States." Thesis, 2012. http://hdl.handle.net/1969.1/ETD-TAMU-2012-05-10745.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The United States cheese consumption has grown considerably over the years. Using Nielsen Homescan panel data for calendar years 2005 and 2006, this dissertation examines the effect of economic and socio-demographic factors on the demand for disaggregated cheese varieties and on the cheese industry in general. In the first essay, we estimated the censored demand for 14 cheese varieties and identified the respective own-price and cross-price elasticities. Also, non-price factors were determined affecting the purchase of each variety as well as the impact of generic dairy advertising. Results revealed that most of the natural cheese varieties have an elastic demand while the processed cheese products exhibited inelastic demands. Strong substitution and complementarity relationships were identified as well, and a two quarter carry-over effect of advertising was observed for most of cheese demands. Results also showed that household demographics affected the demands differently, depending on the nature of the cheese varieties. The second essay examined the impact of retail promotion on the decision to purchase private label processed cheese products using a probit model. A strong negative relationship was found between national brand manufacturer couponing activity and the private label purchase decision. Therefore, national brand couponing appears to be an effective strategy for manufacturers to deter private label growth. This analysis also shows that the decision of purchasing a private label cheese product is influenced by socio-demographic characteristics of the household, namely household income and size, age and education level of the household head, race, ethnicity, and location. In the third study, the feasibility of fortifying processed cheese with omega-3 is investigated. This ex-ante analysis took into account the market conditions and evaluates the increase in the demand for processed cheese needed to offset the costs of fortification in order to maintain the profitability of manufacturers like Kraft. Initially, the censored demand for processed cheese products is estimated using panel data; subsequently, the profitability of manufacturing such product is determined.This analysis shows that, within reasonable market conditions and reasonable marginal costs, the fortification of processed cheese products with omega-3 fatty acids indeed is feasible from a profitability standpoint to manufacturers.
26

Dasgupta, Sanjoy, Adam Tauman Kalai, and Claire Monteleoni. "Analysis of Perceptron-Based Active Learning." 2005. http://hdl.handle.net/1721.1/30585.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We start by showing that in an active learning setting, the Perceptron algorithm needs $\Omega(\frac{1}{\epsilon^2})$ labels to learn linear separators within generalization error $\epsilon$. We then present a simple selective sampling algorithm for this problem, which combines a modification of the perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit sphere, we show that our algorithm reaches generalization error $\epsilon$ after asking for just $\tilde{O}(d \log \frac{1}{\epsilon})$ labels. This exponential improvement over the usual sample complexity of supervised learning has previously been demonstrated only for the computationally more complex query-by-committee algorithm.
27

Chiu, Wan-Yu, and 邱婉瑜. "Developing Fuzzy DSS for Selecting Principal of Senior High School Using the Operation of 2-Tuples Fuzzy Linguistic Label." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/14620714732693151630.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
碩士
國立雲林科技大學
資訊管理系碩士班
90
Abstract Currently, the selecting principal of senior high school has changed from assignment to selection. In this paper, our aims and contributions are: (1)Choice the criteria of selecting principal by surveying referenced literature. (2)Adopt questionnaire of fuzzy linguistic label to visit principals, teachers and education experts, and compare the three groups to get the difference between each other. (3)Use a new operation of 2-tuples fuzzy language label to calculate the weights of criteria and sub-criteria for principal candidates, and establish the algorithm of selecting principal. (4)In verification of selecting principal, by newly operation process, every school select 5 candidates in first stage exam; second stage take oral test by 15 experts. This paper illustrates an example from one senior high school to verify our proposed method. (5)In software system development, we use 2-tuples fuzzy linguistic label to develop fuzzy decision support system for selecting principal of senior high school. The developed DSS can support education institute and as a reference for selecting principal of senior high school.
28

Lording, William James. "A deeper understanding of the Diels–Alder reaction." Phd thesis, 2010. http://hdl.handle.net/1885/11776.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The Diels-Alder reaction was discovered in 1928 and has become the most efficient and practical method for the synthesis of six-membered carbocyclic and heterocyclic rings. This thesis comprises three chapters of results and discussion with the Diels-Alder reaction as a theme. Chapter 2 details an investigation of endo:exo selectivity in the Diels-Alder reactions of 1,3-butadiene. Chapter 3 explores aspects of the intramolecular Diels-Alder reactions of some substituted 1,3,8-nonatrienes, and Chapter 4 describes the domino Diels-Alder reactions of 1,4-diiodo-1,3-butadiene. The Diels-Alder reaction is powerful, general, and widely used in chemical synthesis, and it is well known that many Diels-Alder reactions exhibit endo selectivity, in accord with Alder’s empirical rule. The origins of endo:exo selectivity in the Diels-Alder reaction, however, are not completely understood and there is a dearth of experimental evidence concerning the Diels-Alder reactions of the archetypal 1,3-diene, 1,3- butadiene. Chapter 2 describes a study of the Diels-Alder reactions of an isotopically labelled 1,3-butadiene with a range of simple dienophiles, allowing the endo:exo selectivities of these important reactions to be determined for the first time. The experimental data shed light on the origins of endo:exo selectivity in the Diels-Alder reaction and will serve as an important reference for future computational investigations in this area. The intramolecular Diels-Alder reaction shares many of the virtues of its intermolecular counterpart, however its use in chemical synthesis is limited because intramolecular Diels-Alder reactivity and stereoselectivity are often governed by subtle factors, and can be very difficult to predict. As part of a comprehensive experimental and computational collaboration, Chapter 3 describes an investigation of the heat and Lewis acid promoted intramolecular Diels-Alder reactions of some ether tethered 1,3,8-nonatrienes. Also presented are the results of a rate study and a kinetic isotope effect study involving the intramolecular Diels-Alder reactions of some 1,3,8-nonatrienes. The experimental data are analysed and compared with predicted stereoselectivities, activation barriers and kinetic isotope effects obtained from computational modelling. Increased efficiency in chemical synthesis conserves resources, reduces waste, and saves time and money. Domino reactions are particularly efficient processes, which can generate complex products from simple reactants. Chapter 4 describes an investigation of the domino Diels-Alder reactions of (1E,3E)-1,4-diiodo-1,3-butadiene with maleimide dienophiles, through which a family of bicyclo[2.2.2]oct-2-ene derivatives are produced in one high yielding and stereoselective synthetic step.

До бібліографії