Dissertations / Theses on the topic 'K-fold validation'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 16 dissertations / theses for your research on the topic 'K-fold validation.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Sood, Radhika. "Comparative Data Analytic Approach for Detection of Diabetes." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1544100930937728.
Full textOrding, Marcus. "Context-Sensitive Code Completion : Improving Predictions with Genetic Algorithms." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-205334.
Full textInom området kontextkänslig kodkomplettering finns det ett behov av precisa förutsägande modeller för att kunna föreslå användbara kodkompletteringar. Den traditionella metoden för att optimera prestanda hos kodkompletteringssystem är att empiriskt utvärdera effekten av varje systemparameter individuellt och finjustera parametrarna. Det här arbetet presenterar en genetisk algoritm som kan optimera systemparametrarna med en frihetsgrad som är lika stor som antalet parametrar att optimera. Studien utvärderar effekten av de optimerade parametrarna på det studerade kodkompletteringssystemets pre- diktiva kvalitet. Tidigare utvärdering av referenssystemet utökades genom att även inkludera modellstorlek och slutledningstid. Resultaten av studien visar att den genetiska algoritmen kan förbättra den prediktiva kvali- teten för det studerade kodkompletteringssystemet. Jämfört med referenssystemet så lyckas det förbättrade systemet korrekt känna igen 1 av 10 ytterligare kodmönster som tidigare varit osedda. Förbättringen av prediktiv kvalietet har inte en signifikant inverkan på systemet, då slutledningstiden förblir mindre än 1 ms för båda systemen.
Piják, Marek. "Klasifikace emailové komunikace." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-385889.
Full textBirba, Delwende Eliane. "A Comparative study of data splitting algorithms for machine learning model selection." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-287194.
Full textDatapartitionering används vanligtvis i maskininlärning för att dela data i en tränings, test eller valideringsuppsättning. Detta tillvägagångssätt gör det möjligt för oss att hitta hyperparametrar för modellen och även uppskatta generaliseringsprestanda. I denna forskning genomförde vi en jämförande analys av olika datapartitionsalgoritmer på både verkliga och simulerade data. Vårt huvudmål var att undersöka frågan om hur valet avdatapartitioneringsalgoritm kan förbättra uppskattningen av generaliseringsprestanda. Datapartitioneringsalgoritmer som användes i denna studie var varianter av k-faldig korsvalidering, Kennard-Stone (KS), SPXY (partitionering baserat på gemensamt x-y-avstånd) och bootstrap-algoritm. Varje algoritm användes för att dela upp data i två olika datamängder: tränings- och valideringsdata. Vi analyserade sedan de olika datapartitioneringsalgoritmerna baserat på generaliseringsprestanda uppskattade från valideringen och den externa testuppsättningen. Från resultatet noterade vi att det avgörande för en bra generalisering är storleken på data. För alla datapartitioneringsalgoritmer som använts på små datamängder var klyftan mellan prestanda uppskattad på valideringen och testuppsättningen betydande. Vi noterade emellertid att gapet minskade när det fanns mer data för träning eller validering. För mycket eller för litet data i träningsuppsättningen kan också leda till dålig prestanda. Detta belyser vikten av att ha en korrekt balans mellan storlekarna på tränings- och valideringsmängderna. I vår studie var KS och SPXY de algoritmer med sämst prestanda. Dessa metoder väljer de mest representativa instanserna för att träna modellen, och icke-representativa instanser lämnas för uppskattning av modellprestanda.
Martins, Natalie Henriques. "Modelos de agrupamento e classificação para os bairros da cidade do Rio de Janeiro sob a ótica da Inteligência Computacional: Lógica Fuzzy, Máquinas de Vetores Suporte e Algoritmos Genéticos." Universidade do Estado do Rio de Janeiro, 2015. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=9502.
Full textA partir de 2011, ocorreram e ainda ocorrerão eventos de grande repercussão para a cidade do Rio de Janeiro, como a conferência Rio+20 das Nações Unidas e eventos esportivos de grande importância mundial (Copa do Mundo de Futebol, Olimpíadas e Paraolimpíadas). Estes acontecimentos possibilitam a atração de recursos financeiros para a cidade, assim como a geração de empregos, melhorias de infraestrutura e valorização imobiliária, tanto territorial quanto predial. Ao optar por um imóvel residencial em determinado bairro, não se avalia apenas o imóvel, mas também as facilidades urbanas disponíveis na localidade. Neste contexto, foi possível definir uma interpretação qualitativa linguística inerente aos bairros da cidade do Rio de Janeiro, integrando-se três técnicas de Inteligência Computacional para a avaliação de benefícios: Lógica Fuzzy, Máquina de Vetores Suporte e Algoritmos Genéticos. A base de dados foi construída com informações da web e institutos governamentais, evidenciando o custo de imóveis residenciais, benefícios e fragilidades dos bairros da cidade. Implementou-se inicialmente a Lógica Fuzzy como um modelo não supervisionado de agrupamento através das Regras Elipsoidais pelo Princípio de Extensão com o uso da Distância de Mahalanobis, configurando-se de forma inferencial os grupos de designação linguística (Bom, Regular e Ruim) de acordo com doze características urbanas. A partir desta discriminação, foi tangível o uso da Máquina de Vetores Suporte integrado aos Algoritmos Genéticos como um método supervisionado, com o fim de buscar/selecionar o menor subconjunto das variáveis presentes no agrupamento que melhor classifique os bairros (Princípio da Parcimônia). A análise das taxas de erro possibilitou a escolha do melhor modelo de classificação com redução do espaço de variáveis, resultando em um subconjunto que contém informações sobre: IDH, quantidade de linhas de ônibus, instituições de ensino, valor m médio, espaços ao ar livre, locais de entretenimento e crimes. A modelagem que combinou as três técnicas de Inteligência Computacional hierarquizou os bairros do Rio de Janeiro com taxas de erros aceitáveis, colaborando na tomada de decisão para a compra e venda de imóveis residenciais. Quando se trata de transporte público na cidade em questão, foi possível perceber que a malha rodoviária ainda é a prioritária
Luo, Shan. "Advanced Statistical Methodologies in Determining the Observation Time to Discriminate Viruses Using FTIR." Digital Archive @ GSU, 2009. http://digitalarchive.gsu.edu/math_theses/86.
Full textTandan, Isabelle, and Erika Goteman. "Bank Customer Churn Prediction : A comparison between classification and evaluation methods." Thesis, Uppsala universitet, Statistiska institutionen, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-411918.
Full textRadeschnig, David. "Modelling Implied Volatility of American-Asian Options : A Simple Multivariate Regression Approach." Thesis, Mälardalens högskola, Akademin för utbildning, kultur och kommunikation, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-28951.
Full textBodin, Camilla. "Automatic Flight Maneuver Identification Using Machine Learning Methods." Thesis, Linköpings universitet, Reglerteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-165844.
Full textPo-YangYeh and 葉柏揚. "A Study on the Appropriateness of Repeating K-fold Cross Validation." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/6jc74q.
Full text國立成功大學
工業與資訊管理學系
105
K-fold cross validation is a popular approach for evaluating the performance of classification algorithms. The variance of accuracy estimate resulting from this approach is generally relatively large for conservative inference. Several studies therefore suggested to repeatedly perform K-fold cross validation for reducing the variance. Most of them did not consider the correlation among the repetitions of K-fold cross validation, and hence the variance could be underestimated. The purpose of this thesis is to study the appropriateness of repeating K-fold cross validation. We first investigate whether the accuracy estimates obtained from the repetitions of K-fold cross validation can be assumed to be independent. K-Nearest Neighbor algorithm with K = 1 is used to analyze the dependency relationships among the predictions of two repetitions of K-fold cross validation. Statistical methods are also proposed to test the strength of the dependency relationships. The experimental results on twenty data sets show that the predictions in two repetitions of K-fold cross validation are generally highly correlated, and the correlation will be higher as the number of folds increases. The results of a simulation study suggest that the K-fold cross validation with a small number of repetitions and a large number of folds should be adopted.
Jing-TaiTsai and 蔡敬泰. "Dependency Analysis of the Accuracy Estimates Obtained from k-fold Cross Validation." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/ctp6zg.
Full textChang, Chih-Hsiang, and 張智翔. "Hollow Ball Screw Nut Preload Diagnosis by Support Vector Machine with K-Fold Cross Validation." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/k99q42.
Full text國立彰化師範大學
機電工程學系
106
The purpose of this thesis is conducting a diagnosis method for the ball screw nut with different preload by analyzing signals from different operation conditions. This research focus on how to diagnosing the feed drive status of the machine tool based on short warm-up time before manufacturing. Since it cost long time operation for industrial ball screw turning into failure mode. This research changes different ball nut preload by 2%, 4 % and 6 % of the maximum dynamic in experiments. Motor load current, linear encoder signal and motor revolution speed signal were acquired and adopted for Support Vector Machine (SVM). Linear kernel function and radial basis function kernel function were used as for classification hyperplane. For bettering parameters of SVM classification, the k-fold cross validation is used. Experimental results show that it is possible to distinguish different ball nut preload status via deploying motor current, linear scale and motor revolution speed signals into SVM with k-fold classification. Experimental results show the early warning module for ball screw failure is successful and promising by developing SVM with k-fold cross validation method.
Jian-Kuen, Wu, and 吳建昆. "The impact of stratification on the performance of classification algorithms evaluated by k-fold cross validation." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/xkvvzs.
Full text國立成功大學
資訊管理研究所
105
K-fold cross validation is one of accuracy estimation methods used by many types of experimental research. Stratification method, however, is seldom performed in order to get more representative data in each partition. Stratification has the advantage of reducing the variance of estimators and thus better estimate the true accuracy. This research looks that stratification or imbalance dataset from a different perspective. General dataset is used to develop new algorithm from standard stratification on K-fold cross validation or investigate estimator from bias and variance. Imbalance dataset is used to discuss the performance of applying stratification from recall and precision or the others measure view in rare class value situation. Many types of research recommend their algorithm without the appropriate parametric method for statistical comparison. Therefore the purpose of this study is to compare these stratified methods in same condition environment, decision tree and k-nearest neighbors algorithm through reasonable statistical comparison. The results demonstrated that estimated value performance will closely with K-fold cross validation whether stratification implemented or not from single or multiple general or imbalanced dataset. Furthermore, when considering the factor of time complexity assuming stable estimator, standard stratification could be used on K-fold cross validation. By using advance stratification which takes into account features between data and data, the estimator will relatively more stable than standard stratification.
Chiao-YingLin and 林巧盈. "A study on the selection error rate of classification algorithms evaluated by k-fold cross validation." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/23699989925707105417.
Full text國立成功大學
資訊管理研究所
102
The performance of a classification algorithm is generally evaluated by K-fold cross validation to find the one that has the highest accuracy. Then the model induced from all available data by the best classification algorithm, called full sample model, is used for prediction and interpretation. Since there are no extra data to evaluate the full sample model resulting from the best algorithm, its prediction accuracy can be less than the accuracy of the full sample model induced by the other classification algorithm, and this is called a selection error. This study designs an experiment to calculate and estimate the selection error rate, and attempts to propose a new model for reducing selection error rate. The classification algorithms considered in this study are decision tree, naïve Bayesian classifier, logistic regression, and support vector machine. The experimental results on 30 data sets show that the actual and estimated selection error rates can be greatly different in several cases. The new model that has the median accuracy can reduce the selection error rate without sacrificing the prediction accuracy.
Ying-YiChen and 陳映伊. "A study for investigating classification accuracy and consistency between K-fold cross validation and complete-data model." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/00015703419684582128.
Full text國立成功大學
資訊管理研究所
101
In classification applications, analysts generally use K-fold cross validation to find the classifier that has the best performance. Then the classifier generates a learning model from all available data for prediction and interpretation. The K-fold cross validation randomly divides all available data into K folds, and every fold is in turn used for testing the model learned from the other K-1 folds. The average of the accuracies resulting from the K folds is an estimate of the prediction accuracy of the model learned from all available data. However, this procedure does not guarantee that the model induced from all available data by the best classifier evaluated by K-fold cross validation will have the highest prediction accuracy on new data with respect to the other classifiers. This study first designs an experiment to investigate whether the mean accuracy resulting from K-fold cross validation is a good estimate for the prediction accuracy of the model learned from all available data. An inconsistent rate is then introduced to measure the prediction consistency between the model learned from all available data and the K models induced from K-fold cross validation. When the inconsistent rate is small, using the model learned from all available data for prediction and interpretation will be appropriate. The experimental results on 30 data sets indicate that the average of the mean accuracy resulting from K-fold cross validation and the average of the prediction accuracy of the model induced from all available data on new data are generally not significantly different. However, since the probability of the difference between the mean accuracy resulting from K-fold cross validation and the prediction accuracy resulting from the model induced from all available data to be larger than one percent is approximately 0.60, the probability of choosing a classifier with a lower prediction accuracy on new data is generally larger than 0.3. The inconsistent rate shows that among the four classifiers adopted in this study, decision tree learning is the worst one to generate a model from all available data for prediction and interpretation.
Yi-YinHuang and 黃宜音. "A study on the new models for improving the selection error rate among classification algorithms evaluated by k-fold cross validation." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/88651254533355280085.
Full text國立成功大學
資訊管理研究所
103
The performance of a classification algorithm is generally evaluated by K-fold cross validation to find the one that has the highest accuracy. Then the model induced from all available data by the best classification algorithm, called full sample model, is used for prediction and interpretation. Since there are no extra data to evaluate the full sample model resulting from the best algorithm, its prediction accuracy can be less than the accuracy of the full sample model induced by the other classification algorithm, and this is called a selection error. The experimental results of some previous studies showed that the actual and the estimated selection error rates can be greatly different in several cases. This study repeatedly performs the experiment to stabilize the estimated selection error rates, and attempts to propose new models for reducing selection error rate without sacrificing the prediction accuracy. The classification algorithms considered in this study are decision tree, naïve Bayesian classifier, logistic regression, and support vector machine. This study investigates the impact of the number of classification algorithms, the number of folds, and the characteristics of data sets on the selection error rate, and proposes three methods to generate new models for reducing the selection error rate. The experimental results on thirty data sets show that the selection error rate increases as the number of classification algorithms increases, while the number of folds will not affect the selection error rate. The new models proposed in this study can effectively reduce the selection error rate for interpreting learning results.