Tesi sul tema "Dataset selection"
Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili
Vedi i top-27 saggi (tesi di laurea o di dottorato) per l'attività di ricerca sul tema "Dataset selection".
Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.
Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.
Vedi le tesi di molte aree scientifiche e compila una bibliografia corretta.
Sousa, Massáine Bandeira e. "Improving accuracy of genomic prediction in maize single-crosses through different kernels and reducing the marker dataset". Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/11/11137/tde-07032018-163203/.
Testo completoNo melhoramento de plantas, a predição genômica (PG) é uma eficiente ferramenta para aumentar a eficiência seletiva de genótipos, principalmente, considerando múltiplos ambientes. Esta técnica tem como vantagem incrementar o ganho genético para características complexas e reduzir os custos. Entretanto, ainda são necessárias estratégias que aumentem a acurácia e reduzam o viés dos valores genéticos genotípicos. Nesse contexto, os objetivos foram: i) comparar duas estratégias para obtenção de subconjuntos de marcadores baseado em seus efeitos em relação ao seu impacto na acurácia da seleção genômica; ii) comparar a acurácia seletiva de quatro modelos de PG incluindo o efeito de interação genótipo × ambiente (G×A) e dois kernels (GBLUP e Gaussiano). Para isso, foram usados dados de um painel de diversidade de arroz (RICE) e dois conjuntos de dados de milho (HEL e USP). Estes foram avaliados para produtividade de grãos e altura de plantas. Em geral, houve incremento da acurácia de predição e na eficiência da seleção genômica usando subconjuntos de marcadores. Estes poderiam ser utilizados para construção de arrays e, consequentemente, reduzir os custos com genotipagem. Além disso, utilizando o kernel Gaussiano e incluindo o efeito de interação G×A há aumento na acurácia dos modelos de predição genômica.
Awwad, Tarek. "Context-aware worker selection for efficient quality control in crowdsourcing". Thesis, Lyon, 2018. http://www.theses.fr/2018LYSEI099/document.
Testo completoCrowdsourcing has proved its ability to address large scale data collection tasks at a low cost and in a short time. However, due to the dependence on unknown workers, the quality of the crowdsourcing process is questionable and must be controlled. Indeed, maintaining the efficiency of crowdsourcing requires the time and cost overhead related to this quality control to stay low. Current quality control techniques suffer from high time and budget overheads and from their dependency on prior knowledge about individual workers. In this thesis, we address these limitation by proposing the CAWS (Context-Aware Worker Selection) method which operates in two phases: in an offline phase, the correlations between the worker declarative profiles and the task types are learned. Then, in an online phase, the learned profile models are used to select the most reliable online workers for the incoming tasks depending on their types. Using declarative profiles helps eliminate any probing process, which reduces the time and the budget while maintaining the crowdsourcing quality. In order to evaluate CAWS, we introduce an information-rich dataset called CrowdED (Crowdsourcing Evaluation Dataset). The generation of CrowdED relies on a constrained sampling approach that allows to produce a dataset which respects the requester budget and type constraints. Through its generality and richness, CrowdED helps also in plugging the benchmarking gap present in the crowdsourcing community. Using CrowdED, we evaluate the performance of CAWS in terms of the quality, the time and the budget gain. Results shows that automatic grouping is able to achieve a learning quality similar to job-based grouping, and that CAWS is able to outperform the state-of-the-art profile-based worker selection when it comes to quality, especially when strong budget ant time constraints exist. Finally, we propose CREX (CReate Enrich eXtend) which provides the tools to select and sample input tasks and to automatically generate custom crowdsourcing campaign sites in order to extend and enrich CrowdED
Lingle, Jeremy Andrew. "Evaluating the Performance of Propensity Scores to Address Selection Bias in a Multilevel Context: A Monte Carlo Simulation Study and Application Using a National Dataset". Digital Archive @ GSU, 2009. http://digitalarchive.gsu.edu/eps_diss/56.
Testo completoZoghi, Zeinab. "Ensemble Classifier Design and Performance Evaluation for Intrusion Detection Using UNSW-NB15 Dataset". University of Toledo / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1596756673292254.
Testo completoSilva, Wilbor Poletti. "Archaeomagnetic field intensity evolution during the last two millennia". Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/14/14132/tde-19092018-135335/.
Testo completoVariações temporais do campo magnético da Terra fornecem uma grande diversidade de informações geofísicas sobre a dinâmica das diferentes camadas da Terra. Por ser um campo planetário, aspectos regionais e globais podem ser explorados, dependendo da escala de tempo das variações. Nesta tese, foram investigadas as variações do campo geomagnético para os dois últimos milênios. Para isso, aprimoramentos nos métodos de aquisição da intensidade geomagnética registrada em materiais arqueológicos foram realizados, bem como a aquisição de novos dados e uma avaliação crítica da base de dados arqueomagnética global. Dois novos avanços metodológicos são aqui propostos, sendo eles: i) correção para o método de micro-ondas do efeito da taxa de resfriamento, que está associada à diferença entre os tempos de resfriamento durante a manufatura do material e o das etapas de aquecimento durante o experimento de arqueointensidade; (ii) teste para correção da anisotropia termorremanente a partir da média aritmética de seis amostras posicionadas ortogonalmente umas às outras durante o experimento de arqueointensidade. A variação temporal da intensidade magnética para a América do Sul foi investigada a partir de nove dados inéditos, sendo três provenientes das ruínas das Missões Jesuíticas Guaraníticas e seis de sítios arqueológicos associados a fazendas de charque, ambos localizados no Rio Grande do Sul, Brasil, com idades que cobrem os últimos 400 anos. Esses dados, combinados com o banco de dados regionais de arqueointensidade, demonstram que a influência significativa de componentes não-dipolares do campo magnético na América do Sul começou em ~1800 CE. Finalmente, a partir de uma reavaliação do banco de dados globais de arqueointensidade uma nova interpretação foi proposta a respeito da evolução do dipolo axial geomagnético, sugerindo que essa componente está decrescendo constantemente desde ~700 CE devido à quebra da simetria das fontes advectivas que operam no núcleo externo.
Hrabina, Martin. "VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ". Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-409087.
Testo completoKhan, Md Jafar Ahmed. "Robust linear model selection for high-dimensional datasets". Thesis, University of British Columbia, 2006. http://hdl.handle.net/2429/31082.
Testo completoScience, Faculty of
Statistics, Department of
Graduate
Mo, Dengyao. "Robust and Efficient Feature Selection for High-Dimensional Datasets". University of Cincinnati / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1299010108.
Testo completoPoolsawad, Nongnuch. "Practical approaches to mining of clinical datasets : from frameworks to novel feature selection". Thesis, University of Hull, 2014. http://hydra.hull.ac.uk/resources/hull:8620.
Testo completoKurra, Goutham. "Pattern Recognition in Large Dimensional and Structured Datasets". University of Cincinnati / OhioLINK, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1014322308.
Testo completoVege, Sri Harsha. "Ensemble of Feature Selection Techniques for High Dimensional Data". TopSCHOLAR®, 2012. http://digitalcommons.wku.edu/theses/1164.
Testo completoElsilä, U. (Ulla). "Knowledge discovery method for deriving conditional probabilities from large datasets". Doctoral thesis, University of Oulu, 2007. http://urn.fi/urn:isbn:9789514286698.
Testo completoWan, Cen. "Novel hierarchical feature selection methods for classification and their application to datasets of ageing-related genes". Thesis, University of Kent, 2015. https://kar.kent.ac.uk/54761/.
Testo completoKruczyk, Marcin. "Rule-Based Approaches for Large Biological Datasets Analysis : A Suite of Tools and Methods". Doctoral thesis, Uppsala, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-206137.
Testo completoLuo, Silang. "Data mining of many-attribute data : investigating the interaction between feature selection strategy and statistical features of datasets". Thesis, Heriot-Watt University, 2009. http://hdl.handle.net/10399/2276.
Testo completoFraideinberze, Antonio Canabrava. "Effective and unsupervised fractal-based feature selection for very large datasets: removing linear and non-linear attribute correlations". Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-17112017-154451/.
Testo completoDada uma grande base de dados de dimensionalidade moderada a alta, como identificar padrões úteis nos objetos de dados? Nesses casos, a redução de dimensionalidade é essencial para superar um fenômeno conhecido na literatura como a maldição da alta dimensionalidade. Embora existam algoritmos capazes de reduzir a dimensionalidade de conjuntos de dados na escala de Terabytes, infelizmente, todos falham em relação à identificação/eliminação de correlações não lineares entre os atributos. Este trabalho de Mestrado trata o problema explorando conceitos da Teoria de Fractais e processamento paralelo em massa para apresentar Curl-Remover, uma nova técnica de redução de dimensionalidade bem adequada ao pré-processamento de Big Data. Suas principais contribuições são: (a) Curl-Remover elimina correlações lineares e não lineares entre atributos, bem como atributos irrelevantes; (b) não depende de supervisão do usuário e é útil para tarefas analíticas em geral não apenas para a classificação; (c) apresenta escalabilidade linear tanto em relação ao número de objetos de dados quanto ao número de máquinas utilizadas; (d) não requer que o usuário sugira um número de atributos para serem removidos, e; (e) mantêm a semântica dos atributos por ser uma técnica de seleção de atributos, não de extração de atributos. Experimentos foram executados em conjuntos de dados sintéticos e reais contendo até 1,1 bilhões de pontos, e a nova técnica Curl-Remover apresentou desempenho superior comparada a dois algoritmos do estado da arte baseados em PCA, obtendo em média até 8% a mais em acurácia de resultados.
Granato, Italo Stefanine Correia. "snpReady and BGGE: R packages to prepare datasets and perform genome-enabled predictions". Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/11/11137/tde-21062018-134207/.
Testo completoO uso de marcadores moleculares permite um aumento na eficiência da seleção, bem como uma melhor compreensão dos recursos genéticos em programas de melhoramento. No entanto, com o aumento do número de marcadores, é necessário o processamento deste antes de deixa-lo disponível para uso. Além disso, para explorar a interação genótipo x ambiente (GA) no contexto da predição genômica, algumas matrizes de covariância precisam ser obtidas antes da etapa de predição. Assim, com o objetivo de facilitar a introdução de práticas genômicas nos programa de melhoramento, dois pacotes em R foram desenvolvidos. O primeiro, snpReady, foi criado para preparar conjuntos de dados para realizar estudos genômicos. Este pacote oferece três funções para atingir esse objetivo, organizando e aplicando o controle de qualidade, construindo a matriz de parentesco genômico e com estimativas de parâmetros genéticos populacionais. Além disso, apresentamos um novo método de imputação para marcas perdidas. O segundo pacote é o BGGE, criado para gerar kernels para alguns modelos genômicos de interação GA e realizar predições genômicas. Consiste em duas funções (getK e BGGE). A primeira é utilizada para criar kernels para os modelos GA, e a última realiza predições genômicas, com alguns recursos especifico para os kernels GA que diminuem o tempo computacional. Os recursos abordados nos dois pacotes apresentam uma opção rápida e direta para ajudar a introdução e uso de análises genômicas nas diversas etapas do programa de melhoramento.
Brown, Ryan Charles. "Development of Ground-Level Hyperspectral Image Datasets and Analysis Tools, and their use towards a Feature Selection based Sensor Design Method for Material Classification". Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/84944.
Testo completoPh. D.
Duncan, Andrew Paul. "The analysis and application of artificial neural networks for early warning systems in hydrology and the environment". Thesis, University of Exeter, 2014. http://hdl.handle.net/10871/17569.
Testo completoNeumann, Ursula [Verfasser], Dmitrij [Akademischer Betreuer] Frischmann, Dominik [Gutachter] Heider e Dmitrij [Gutachter] Frischmann. "Stability and Accuracy Analysis of a Feature Selection Ensemble for Binary Classification in Biomedical Datasets / Ursula Neumann ; Gutachter: Dominik Heider, Dmitrij Frischmann ; Betreuer: Dmitrij Frischmann". München : Universitätsbibliothek der TU München, 2018. http://d-nb.info/1154931641/34.
Testo completoYang, Zong-ming, e 楊宗明. "Applying Clonal Selection Theory in Dataset Clustering". Thesis, 2013. http://ndltd.ncl.edu.tw/handle/05851847141349430024.
Testo completo國立高雄第一科技大學
資訊管理研究所
101
This thesis presents a clone selection algorithm to solve the data clustering problem. A clonal selection algorithm is primarily focused on mimicking the clonal selection theory which is composed of the mechanisms; clonal selection, clonal expansion, and affinity maturation via somatic hypermutation. The important feature of the theory is that when a cell is selected and proliferates, then subjected to cloning proportional to affinity rank, and the hypermutation of clones proportional to affinity weights. The resultant clonal-set competes with the existent antibody population for membership in the next generation. Finally ,This study incorporates Clonal Selection Theory with Particle Swarm Optimization to data clustering this UCI public dataset. In the structure of hybrid systems to prevent from early convergence in the computing process. Experimental results show that the proposed hybrid systems with high diversity improve the performance of data clustering.
Lutu, P. E. N. (Patricia Elizabeth Nalwoga). "Dataset selection for aggregate model implementation in predictive data mining". Thesis, 2010. http://hdl.handle.net/2263/29486.
Testo completoThesis (PhD)--University of Pretoria, 2010.
Computer Science
unrestricted
Chen, Yen-Tze, e 陳彥澤. "Comparison of single feature selection and fusion feature selection for medical dataset". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/v55952.
Testo completo元智大學
資訊工程學系
106
In the era of rapid development of information technology, there are nearly tens of thousands of documents generated every day. Accumulation of a large amount of information has been seen everywhere. Therefore, businesses now value the data of users and predict future consumers through a data analysis software. The preference, in turn, puts the top-selling merchandise in the mall in a prominent position, and conversely, the poorly-selling item matching sales promotion program increases sales. Big data, or huge data, refers to the fact that when data is so large that database systems cannot store, compute, and process data within an effective period of time, and analysis becomes information that can be interpreted, it is called big data. Therefore, in recent years, Experts in the field are committed to solving the problem of too large data and hope to complete the analysis in a limited time. The research area of this thesis is feature selection. The main function of feature selection is to delete redundant or repetitive data in the dataset, thereby reducing the complexity and time of the analysis. In addition, the research topic is to explore the selection and integration of single features. Differences between feature selections are used to illustrate whether fusion feature selection has a better prediction model accuracy than single feature selection. As far as practical application is concerned, UCI Machine Learning Repository's and KDD 2008 medical data set are selected as the initial data for this experiment. Differences between classification techniques in medical statistics and classification techniques in the field of information.
Fu, JuiHsi, e 傅瑞曦. "Sample Selection on Labeling Imbalanced Datasets and Learning Efficient Classifiers". Thesis, 2013. http://ndltd.ncl.edu.tw/handle/60300828589932265351.
Testo completo國立中正大學
資訊工程研究所
101
When building a classification system, two practical issues should be carefully concerned. Firstly, it is difficult to collect a complete dataset in a short period of time. Secondly, it is expensive to label collected data by human effort. In this thesis, we study further research issues in active learning which aims to label informative samples and in incremental learning which generates the classifier using sequential datasets. Thus we concentrate on designing approach to label imbalanced datasets and to learn efficient classifiers. Our main concept is to select informative samples used for labeling data or for adjusting classifiers. Our active learning approaches aim to query unlabeled samples without being affected by the imbalanced classification problem. They select the specified labeled samples to determine whether an unlabeled sample is queried or not. Moreover, the objective of our incremental learning approaches is to select informative samples to efficiently adjust the classifier. Those samples could be misclassified or classified in low confidence. We also concern that the dataset which is sequentially collected is still insufficient. In this condition, we select labeled samples that are relevant to generate specific classifiers for the target sample. In our experiments, approaches are evaluated on synthetic datasets and some real-world datasets from UCI repository and the campus of National Chung Cheng University. Through the experimental results and theoretical analysis, it is presented that our approaches have the abilities of effectively handling the practical issues in labeling data and adjusting classifiers.
YEH, CHENG-HUA, e 葉政華. "A Study on Gene Selection and Classification with Microarray Datasets". Thesis, 2003. http://ndltd.ncl.edu.tw/handle/26638596245660080051.
Testo completo國立臺灣大學
資訊工程學研究所
91
This thesis discusses two essential issues in Microarray data analysis: gene selection in tumor classification and learning gene functional classes. The first issue concerns how to select the informative genes with respect to a specific classification problem from a large number of genes in Microarray dataset. This thesis proposes two clustering-based methods. Experimental results reveal that the proposed methods are able to identify a set of informative genes, when applied to a challenging tumor dataset. The second issue studied in this thesis is aimed at identifying correlations between clusters of co-expression genes in Microarray dataset and co-regulated cell activities. This thesis investigates the effects of exploiting supervised learning algorithms to deal with this problem. Experimental results show that the novel RBF networks based learning algorithm lately proposed by our research team and the support vector machines (SVM) can deliver far better results than the other well-known approaches included in this study. Nevertheless, experimental results also show that the supervised learning based approach can successfully be applied to only a few classes of co-regulated genes. In response to this observation, this thesis proposes two methods to improve the recall rates of the supervised learning based approach.
Syu, Jen-Hui, e 徐仁徽. "Target Genes Selection for Human Colon Cancer Datasets Based on Data Mining Algorithm". Thesis, 2013. http://ndltd.ncl.edu.tw/handle/74213282671290750613.
Testo completo國立中興大學
基因體暨生物資訊學研究所
101
In 2013, the ministry of health and welfare announcement ten leading causes of death. Malignancies is the first of the top ten causes of deaths for 31 years running. Because of the colon cancer, rectum cancer and anal cancer deaths have 5131 people. Colon cancer is the first most commonly diagnosed cancer in the Taiwan, and colon cancer is the third most common type of cancer in both sexes. When recognized at this early stage, they are often reversible. If the patients can early detection of colon cancer, life expectancy would lengthen if colon cancer risks are tackled. The risk of developing colon cancer increases with advancing age. Many risk Factors that increases the chance of getting a colon cancer. For example dietary habit, gender, family history and hereditary factors etc. Many factors might give rise to colon cancer risk. In the last few years, the average age of patient is lowering. This study have developed a novel system target genes, prediction analysis and medical data mining system. Which the use of target genes to build a database, integration of artificial neural networks and decision tree analysis techniques. Further Studies in colon cancer patients of target genes. Expect to establish the diagnostic system, reduce the misdiagnosis rate and reduce health care costs. Our experimental result show that the accuracy rate of back-propagation neural network and random tree are 62% and 95.6%. Finally, this thesis is to build a visual web interface systems, hoping to make medical staff in the diagnostic process more intuitive and fast observed gene expression. This study hope genes problem solving and medical decision making, motivated by efforts to improve human health.
HENDRICK e 林高民. "Using Biological Feature Selection Approach on Large-Scale Integrated Microarray Datasets for Colorectal Cancer Prediction". Thesis, 2017. http://ndltd.ncl.edu.tw/handle/phj8te.
Testo completo慈濟大學
醫學資訊學系碩士班
105
Colorectal cancer is one of the most common cancer with the fourth highest mortality rate in the world. The microarray can be used to gather information from tissue samples regarding gene expression differences that will be useful in diagnosing colorectal cancer from the molecular level. Highly accurate prediction method is needed to deal with the high incidence number of colorectal cancer. However, this method is still facing challenges, starting from selecting appropriate features, the number of selected feature until classification algorithm. Therefore, large-scale studies, the comparison between studies, and selecting informative data are very crucial to make this method can be applied clinically. Here, we propose a systematic method, starting from data collection, data preprocessing, data merging, feature selection until classification for colorectal cancer prediction by implementing the large-scale integrated microarray datasets along with the biological feature selection to overcome the highly accurate prediction method that can be used clinically. We integrated (i) 31 curated colorectal microarray datasets with 2443 cancer and 361 normal samples, (ii) variance stabilization normalization as the preprocessing method, (iii) empirical Bayes as the batch effect removal method with 2 factors –batch and phenotype info-, (iv) biological feature selection by using gene set enrichment analysis and gene ontology enrichment analysis based on gene functional biological knowledge, and (v) support vector machine as the classification algorithm simultaneously. As the results, our method provides the more reliable prediction result without losing its high accurate prediction result (around 98% accuracy) and finds the correlated genes in inflammatory response playing the important role in the development of adenomatous polyps that can lead to colorectal cancer. In addition, our method also can be used clinically by providing the large sample size number along with large comparison from different microarray studies.