Log in

Relevant bibliographies by topics / K-fold validation

Contents

Journal articles
Dissertations / Theses
Book chapters
Conference papers

Academic literature on the topic 'K-fold validation'

Author: Grafiati

Published: 4 June 2021

Last updated: 6 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'K-fold validation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "K-fold validation"

1

Wong, Tzu-Tsung, and Po-Yang Yeh. "Reliable Accuracy Estimates from k-Fold Cross Validation." IEEE Transactions on Knowledge and Data Engineering 32, no. 8 (August 1, 2020): 1586–94. http://dx.doi.org/10.1109/tkde.2019.2912815.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Soper, Daniel S. "Greed Is Good: Rapid Hyperparameter Optimization and Model Selection Using Greedy k-Fold Cross Validation." Electronics 10, no. 16 (August 16, 2021): 1973. http://dx.doi.org/10.3390/electronics10161973.

Full text

Abstract:

Selecting a final machine learning (ML) model typically occurs after a process of hyperparameter optimization in which many candidate models with varying structural properties and algorithmic settings are evaluated and compared. Evaluating each candidate model commonly relies on k-fold cross validation, wherein the data are randomly subdivided into k folds, with each fold being iteratively used as a validation set for a model that has been trained using the remaining folds. While many research studies have sought to accelerate ML model selection by applying metaheuristic and other search methods to the hyperparameter space, no consideration has been given to the k-fold cross validation process itself as a means of rapidly identifying the best-performing model. The current study rectifies this oversight by introducing a greedy k-fold cross validation method and demonstrating that greedy k-fold cross validation can vastly reduce the average time required to identify the best-performing model when given a fixed computational budget and a set of candidate models. This improved search time is shown to hold across a variety of ML algorithms and real-world datasets. For scenarios without a computational budget, this paper also introduces an early stopping algorithm based on the greedy cross validation method. The greedy early stopping method is shown to outperform a competing, state-of-the-art early stopping method both in terms of search time and the quality of the ML models selected by the algorithm. Since hyperparameter optimization is among the most time-consuming, computationally intensive, and monetarily expensive tasks in the broader process of developing ML-based solutions, the ability to rapidly identify optimal machine learning models using greedy cross validation has obvious and substantial benefits to organizations and researchers alike.

APA, Harvard, Vancouver, ISO, and other styles

3

Wang, Ju E., and Jian Zhong Qiao. "Parameter Selection of SVR Based on Improved K-Fold Cross Validation." Applied Mechanics and Materials 462-463 (November 2013): 182–86. http://dx.doi.org/10.4028/www.scientific.net/amm.462-463.182.

Full text

Abstract:

This article firstly uses svm to forecast cashmere price time series. The forecasting result mainly depends on parameter selection. The normal parameter selection is based on k-fold cross validation. The k-fold cross validation is suitable for classification. In this essay, k-fold cross validation is improved to ensure that only the older data can be used to forecast latter data to improve prediction accuracy. This essay trains the cashmere price time series data to build mathematical model based on SVM. The selection of the model parameters are based on improved cross validation. The price of Cashmere can be forecasted by the model. The simulation results show that support vector machine has higher fitting precision in the situation of small samples. It is feasible to forecast cashmere price based on SVM.

APA, Harvard, Vancouver, ISO, and other styles

4

ALPTEKIN, AHMET, and OLCAY KURSUN. "MISS ONE OUT: A CROSS-VALIDATION METHOD UTILIZING INDUCED TEACHER NOISE." International Journal of Pattern Recognition and Artificial Intelligence 27, no. 07 (November 2013): 1351003. http://dx.doi.org/10.1142/s0218001413510038.

Full text

Abstract:

Leave-one-out (LOO) and its generalization, K-Fold, are among most well-known cross-validation methods, which divide the sample into many folds, each one of which is, in turn, left out for testing, while the other parts are used for training. In this study, as an extension of this idea, we propose a new cross-validation approach that we called miss-one-out (MOO) that mislabels the example(s) in each fold and keeps this fold in the training set as well, rather than leaving it out as LOO does. Then, MOO tests whether the trained classifier can correct the erroneous label of the training sample. In principle, having only one fold deliberately labeled incorrectly should have only a small effect on the classifier that uses this bad-fold along with K - 1 good folds and can be utilized as a generalization measure of the classifier. Experimental results on a number of benchmark datasets and three real bioinformatics dataset show that MOO can better estimate the test set accuracy of the classifier.

APA, Harvard, Vancouver, ISO, and other styles

5

Wong, Tzu-Tsung, and Nai-Yu Yang. "Dependency Analysis of Accuracy Estimates in k-Fold Cross Validation." IEEE Transactions on Knowledge and Data Engineering 29, no. 11 (November 1, 2017): 2417–27. http://dx.doi.org/10.1109/tkde.2017.2740926.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Fushiki, Tadayoshi. "Estimation of prediction error by using K-fold cross-validation." Statistics and Computing 21, no. 2 (October 10, 2009): 137–46. http://dx.doi.org/10.1007/s11222-009-9153-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Wiens, Trevor S., Brenda C. Dale, Mark S. Boyce, and G. Peter Kershaw. "Three way k-fold cross-validation of resource selection functions." Ecological Modelling 212, no. 3-4 (April 2008): 244–55. http://dx.doi.org/10.1016/j.ecolmodel.2007.10.005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Nasution, Muhammad Rangga Aziz, and Mardhiya Hayaty. "Perbandingan Akurasi dan Waktu Proses Algoritma K-NN dan SVM dalam Analisis Sentimen Twitter." Jurnal Informatika 6, no. 2 (September 5, 2019): 226–35. http://dx.doi.org/10.31311/ji.v6i2.5129.

Full text

Abstract:

Salah satu cabang ilmu komputer yaitu pembelajaran mesin (machine learning) menjadi tren dalam beberapa waktu terakhir. Pembelajaran mesin bekerja dengan memanfaatkan data dan algoritma untuk membuat model dengan pola dari kumpulan data tersebut. Selain itu, pembelajaran mesin juga mempelajari bagaimama model yang telah dibuat dapat memprediksi keluaran (output) berdasarkan pola yang ada. Terdapat dua jenis metode pembelajaran mesin yang dapat digunakan untuk analisis sentimen: supervised learning dan unsupervised learning. Penelitian ini akan membandingkan dua algoritma klasifikasi yang termasuk dari supervised learning: algoritma K-Nearest Neighbor dan Support Vector Machine, dengan cara membuat model dari masing-masing algoritma dengan objek teks sentimen. Perbandingan dilakukan untuk mengetahui algoritma mana lebih baik dalam segi akurasi dan waktu proses. Hasil pada perhitungan akurasi menunjukkan bahwa metode Support Vector Machine lebih unggul dengan nilai 89,70% tanpa K-Fold Cross Validation dan 88,76% dengan K-Fold Cross Validation. Sedangkan pada perhitungan waktu proses metode K-Nearest Neighbor lebih unggul dengan waktu proses 0.0160s tanpa K-Fold Cross Validation dan 0.1505s dengan K-Fold Cross Validation.

APA, Harvard, Vancouver, ISO, and other styles

9

Y.H. Ahmed, Falah, Yasir Hassan Ali, and Siti Mariyam Shamsuddin. "Using K-Fold Cross Validation Proposed Models for Spikeprop Learning Enhancements." International Journal of Engineering & Technology 7, no. 4.11 (October 2, 2018): 145. http://dx.doi.org/10.14419/ijet.v7i4.11.20790.

Full text

Abstract:

Spiking Neural Network (SNN) uses individual spikes in time field to perform as well as to communicate computation in such a way as the actual neurons act. SNN was not studied earlier as it was considered too complicated and too hard to examine. Several limitations concerning the characteristics of SNN which were not researched earlier are now resolved since the introduction of SpikeProp in 2000 by Sander Bothe as a supervised SNN learning model. This paper defines the research developments of the enhancement Spikeprop learning using K-fold cross validation for datasets classification. Hence, this paper introduces acceleration factors of SpikeProp using Radius Initial Weight and Differential Evolution (DE) Initialization weights as proposed methods. In addition, training and testing using K-fold cross validation properties of the new proposed method were investigated using datasets obtained from Machine Learning Benchmark Repository as an improved Bohte’s algorithm. A comparison of the performance was made between the proposed method and Backpropagation (BP) together with the Standard SpikeProp. The findings also reveal that the proposed method has better performance when compared to Standard SpikeProp as well as the BP for all datasets performed by K-fold cross validation for classification datasets.

APA, Harvard, Vancouver, ISO, and other styles

10

Jiang, Gaoxia, and Wenjian Wang. "Error estimation based on variance analysis of k -fold cross-validation." Pattern Recognition 69 (September 2017): 94–106. http://dx.doi.org/10.1016/j.patcog.2017.03.025.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "K-fold validation"

1

Sood, Radhika. "Comparative Data Analytic Approach for Detection of Diabetes." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1544100930937728.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Ording, Marcus. "Context-Sensitive Code Completion : Improving Predictions with Genetic Algorithms." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-205334.

Full text

Abstract:

Within the area of context-sensitive code completion there is a need for accurate predictive models in order to provide useful code completion predictions. The traditional method for optimizing the performance of code completion systems is to empirically evaluate the effect of each system parameter individually and fine-tune the parameters. This thesis presents a genetic algorithm that can optimize the system parameters with a degree-of-freedom equal to the number of parameters to optimize. The study evaluates the effect of the optimized parameters on the prediction quality of the studied code completion system. Previous evaluation of the reference code completion system is also extended to include model size and inference speed. The results of the study shows that the genetic algorithm is able to improve the prediction quality of the studied code completion system. Compared with the reference system, the enhanced system is able to recognize 1 in 10 additional previously unseen code patterns. This increase in prediction quality does not significantly impact the system performance, as the inference speed remains less than 1 ms for both systems.
Inom området kontextkänslig kodkomplettering finns det ett behov av precisa förutsägande modeller för att kunna föreslå användbara kodkompletteringar. Den traditionella metoden för att optimera prestanda hos kodkompletteringssystem är att empiriskt utvärdera effekten av varje systemparameter individuellt och finjustera parametrarna. Det här arbetet presenterar en genetisk algoritm som kan optimera systemparametrarna med en frihetsgrad som är lika stor som antalet parametrar att optimera. Studien utvärderar effekten av de optimerade parametrarna på det studerade kodkompletteringssystemets pre- diktiva kvalitet. Tidigare utvärdering av referenssystemet utökades genom att även inkludera modellstorlek och slutledningstid. Resultaten av studien visar att den genetiska algoritmen kan förbättra den prediktiva kvali- teten för det studerade kodkompletteringssystemet. Jämfört med referenssystemet så lyckas det förbättrade systemet korrekt känna igen 1 av 10 ytterligare kodmönster som tidigare varit osedda. Förbättringen av prediktiv kvalietet har inte en signifikant inverkan på systemet, då slutledningstiden förblir mindre än 1 ms för båda systemen.

APA, Harvard, Vancouver, ISO, and other styles

3

Piják, Marek. "Klasifikace emailové komunikace." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-385889.

Full text

Abstract:

This diploma's thesis is based around creating a classifier, which will be able to recognize an email communication received by Topefekt.s.r.o on daily basis and assigning it into classification class. This project will implement some of the most commonly used classification methods including machine learning. Thesis will also include evaluation comparing all used methods.

APA, Harvard, Vancouver, ISO, and other styles

4

Birba, Delwende Eliane. "A Comparative study of data splitting algorithms for machine learning model selection." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-287194.

Full text

Abstract:

Data splitting is commonly used in machine learning to split data into a train, test, or validation set. This approach allows us to find the model hyper-parameter and also estimate the generalization performance. In this research, we conducted a comparative analysis of different data partitioning algorithms on both real and simulated data. Our main objective was to address the question of how the choice of data splitting algorithm can improve the estimation of the generalization performance. Data splitting algorithms used in this study were variants of k-fold, Kennard-Stone, SPXY ( sample set partitioning based on joint x-y distance), and random sampling algorithm. Each algorithm divided the data into two subset, training/validation. The training set was used to fit the model and validation for the evaluation. We then analyzed the different data splitting algorithms based on the generalization performances estimated from the validation and the external test set. From the result, we noted that the important determinant for a good generalization is the size of the dataset. For all the data sample methods applied on small data set, the gap between the performance estimated on the validation and test set was significant. However, we noted that the gap reduced when there was more data in training or validation. Too many or few data in the training set can also lead to bad model performance. So it is importance to have a reasonable balance between the training/validation set sizes. In our study, KS and SPXY was the splitting algorithm with poor model performance estimation. Indeed these methods select the most representative samples to train the model, and poor representative samples are left for model performance estimation.
Datapartitionering används vanligtvis i maskininlärning för att dela data i en tränings, test eller valideringsuppsättning. Detta tillvägagångssätt gör det möjligt för oss att hitta hyperparametrar för modellen och även uppskatta generaliseringsprestanda. I denna forskning genomförde vi en jämförande analys av olika datapartitionsalgoritmer på både verkliga och simulerade data. Vårt huvudmål var att undersöka frågan om hur valet avdatapartitioneringsalgoritm kan förbättra uppskattningen av generaliseringsprestanda. Datapartitioneringsalgoritmer som användes i denna studie var varianter av k-faldig korsvalidering, Kennard-Stone (KS), SPXY (partitionering baserat på gemensamt x-y-avstånd) och bootstrap-algoritm. Varje algoritm användes för att dela upp data i två olika datamängder: tränings- och valideringsdata. Vi analyserade sedan de olika datapartitioneringsalgoritmerna baserat på generaliseringsprestanda uppskattade från valideringen och den externa testuppsättningen. Från resultatet noterade vi att det avgörande för en bra generalisering är storleken på data. För alla datapartitioneringsalgoritmer som använts på små datamängder var klyftan mellan prestanda uppskattad på valideringen och testuppsättningen betydande. Vi noterade emellertid att gapet minskade när det fanns mer data för träning eller validering. För mycket eller för litet data i träningsuppsättningen kan också leda till dålig prestanda. Detta belyser vikten av att ha en korrekt balans mellan storlekarna på tränings- och valideringsmängderna. I vår studie var KS och SPXY de algoritmer med sämst prestanda. Dessa metoder väljer de mest representativa instanserna för att träna modellen, och icke-representativa instanser lämnas för uppskattning av modellprestanda.

APA, Harvard, Vancouver, ISO, and other styles

5

Martins, Natalie Henriques. "Modelos de agrupamento e classificação para os bairros da cidade do Rio de Janeiro sob a ótica da Inteligência Computacional: Lógica Fuzzy, Máquinas de Vetores Suporte e Algoritmos Genéticos." Universidade do Estado do Rio de Janeiro, 2015. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=9502.

Full text

Abstract:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
A partir de 2011, ocorreram e ainda ocorrerão eventos de grande repercussão para a cidade do Rio de Janeiro, como a conferência Rio+20 das Nações Unidas e eventos esportivos de grande importância mundial (Copa do Mundo de Futebol, Olimpíadas e Paraolimpíadas). Estes acontecimentos possibilitam a atração de recursos financeiros para a cidade, assim como a geração de empregos, melhorias de infraestrutura e valorização imobiliária, tanto territorial quanto predial. Ao optar por um imóvel residencial em determinado bairro, não se avalia apenas o imóvel, mas também as facilidades urbanas disponíveis na localidade. Neste contexto, foi possível definir uma interpretação qualitativa linguística inerente aos bairros da cidade do Rio de Janeiro, integrando-se três técnicas de Inteligência Computacional para a avaliação de benefícios: Lógica Fuzzy, Máquina de Vetores Suporte e Algoritmos Genéticos. A base de dados foi construída com informações da web e institutos governamentais, evidenciando o custo de imóveis residenciais, benefícios e fragilidades dos bairros da cidade. Implementou-se inicialmente a Lógica Fuzzy como um modelo não supervisionado de agrupamento através das Regras Elipsoidais pelo Princípio de Extensão com o uso da Distância de Mahalanobis, configurando-se de forma inferencial os grupos de designação linguística (Bom, Regular e Ruim) de acordo com doze características urbanas. A partir desta discriminação, foi tangível o uso da Máquina de Vetores Suporte integrado aos Algoritmos Genéticos como um método supervisionado, com o fim de buscar/selecionar o menor subconjunto das variáveis presentes no agrupamento que melhor classifique os bairros (Princípio da Parcimônia). A análise das taxas de erro possibilitou a escolha do melhor modelo de classificação com redução do espaço de variáveis, resultando em um subconjunto que contém informações sobre: IDH, quantidade de linhas de ônibus, instituições de ensino, valor m médio, espaços ao ar livre, locais de entretenimento e crimes. A modelagem que combinou as três técnicas de Inteligência Computacional hierarquizou os bairros do Rio de Janeiro com taxas de erros aceitáveis, colaborando na tomada de decisão para a compra e venda de imóveis residenciais. Quando se trata de transporte público na cidade em questão, foi possível perceber que a malha rodoviária ainda é a prioritária

APA, Harvard, Vancouver, ISO, and other styles

6

Luo, Shan. "Advanced Statistical Methodologies in Determining the Observation Time to Discriminate Viruses Using FTIR." Digital Archive @ GSU, 2009. http://digitalarchive.gsu.edu/math_theses/86.

Full text

Abstract:

Fourier transform infrared (FTIR) spectroscopy, one method of electromagnetic radiation for detecting specific cellular molecular structure, can be used to discriminate different types of cells. The objective is to find the minimum time (choice among 2 hour, 4 hour and 6 hour) to record FTIR readings such that different viruses can be discriminated. A new method is adopted for the datasets. Briefly, inner differences are created as the control group, and Wilcoxon Signed Rank Test is used as the first selecting variable procedure in order to prepare the next stage of discrimination. In the second stage we propose either partial least squares (PLS) method or simply taking significant differences as the discriminator. Finally, k-fold cross-validation method is used to estimate the shrinkages of the goodness measures, such as sensitivity, specificity and area under the ROC curve (AUC). There is no doubt in our mind 6 hour is enough for discriminating mock from Hsv1, and Coxsackie viruses. Adeno virus is an exception.

APA, Harvard, Vancouver, ISO, and other styles

7

Tandan, Isabelle, and Erika Goteman. "Bank Customer Churn Prediction : A comparison between classification and evaluation methods." Thesis, Uppsala universitet, Statistiska institutionen, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-411918.

Full text

Abstract:

This study aims to assess which supervised statistical learning method; random forest, logistic regression or K-nearest neighbor, that is the best at predicting banks customer churn. Additionally, the study evaluates which cross-validation set approach; k-Fold cross-validation or leave-one-out cross-validation that yields the most reliable results. Predicting customer churn has increased in popularity since new technology, regulation and changed demand has led to an increase in competition for banks. Thus, with greater reason, banks acknowledge the importance of maintaining their customer base. The findings of this study are that unrestricted random forest model estimated using k-Fold is to prefer out of performance measurements, computational efficiency and a theoretical point of view. Albeit, k-Fold cross-validation and leave-one-out cross-validation yield similar results, k-Fold cross-validation is to prefer due to computational advantages. For future research, methods that generate models with both good interpretability and high predictability would be beneficial. In order to combine the knowledge of which customers end their engagement as well as understanding why. Moreover, interesting future research would be to analyze at which dataset size leave-one-out cross-validation and k-Fold cross-validation yield the same results.

APA, Harvard, Vancouver, ISO, and other styles

8

Radeschnig, David. "Modelling Implied Volatility of American-Asian Options : A Simple Multivariate Regression Approach." Thesis, Mälardalens högskola, Akademin för utbildning, kultur och kommunikation, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-28951.

Full text

Abstract:

This report focus upon implied volatility for American styled Asian options, and a least squares approximation method as a way of estimating its magnitude. Asian option prices are calculated/approximated based on Quasi-Monte Carlo simulations and least squares regression, where a known volatility is being used as input. A regression tree then empirically builds a database of regression vectors for the implied volatility based on the simulated output of option prices. The mean squared errors between imputed and estimated volatilities are then compared using a five-folded cross-validation test as well as the non-parametric Kruskal-Wallis hypothesis test of equal distributions. The study results in a proposed semi-parametric model for estimating implied volatilities from options. The user must however be aware of that this model may suffer from bias in estimation, and should thereby be used with caution.

APA, Harvard, Vancouver, ISO, and other styles

9

Bodin, Camilla. "Automatic Flight Maneuver Identification Using Machine Learning Methods." Thesis, Linköpings universitet, Reglerteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-165844.

Full text

Abstract:

This thesis proposes a general approach to solve the offline flight-maneuver identification problem using machine learning methods. The purpose of the study was to provide means for the aircraft professionals at the flight test and verification department of Saab Aeronautics to automate the procedure of analyzing flight test data. The suggested approach succeeded in generating binary classifiers and multiclass classifiers that identified six flight maneuvers of different complexity from real flight test data. The binary classifiers solved the problem of identifying one maneuver from flight test data at a time, while the multiclass classifiers solved the problem of identifying several maneuvers from flight test data simultaneously. To achieve these results, the difficulties that this time series classification problem entailed were simplified by using different strategies. One strategy was to develop a maneuver extraction algorithm that used handcrafted rules. Another strategy was to represent the time series data by statistical measures. There was also an issue of an imbalanced dataset, where one class far outweighed others in number of samples. This was solved by using a modified oversampling method on the dataset that was used for training. Logistic Regression, Support Vector Machines with both linear and nonlinear kernels, and Artifical Neural Networks were explored, where the hyperparameters for each machine learning algorithm were chosen during model estimation by 4-fold cross-validation and solving an optimization problem based on important performance metrics. A feature selection algorithm was also used during model estimation to evaluate how the performance changes depending on how many features were used. The machine learning models were then evaluated on test data consisting of 24 flight tests. The results given by the test data set showed that the simplifications done were reasonable, but the maneuver extraction algorithm could sometimes fail. Some maneuvers were easier to identify than others and the linear machine learning models resulted in a poor fit to the more complex classes. In conclusion, both binary classifiers and multiclass classifiers could be used to solve the flight maneuver identification problem, and solving a hyperparameter optimization problem boosted the performance of the finalized models. Nonlinear classifiers performed the best on average across all explored maneuvers.

APA, Harvard, Vancouver, ISO, and other styles

10

Po-YangYeh and 葉柏揚. "A Study on the Appropriateness of Repeating K-fold Cross Validation." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/6jc74q.

Full text

Abstract:

碩士
國立成功大學
工業與資訊管理學系
105
K-fold cross validation is a popular approach for evaluating the performance of classification algorithms. The variance of accuracy estimate resulting from this approach is generally relatively large for conservative inference. Several studies therefore suggested to repeatedly perform K-fold cross validation for reducing the variance. Most of them did not consider the correlation among the repetitions of K-fold cross validation, and hence the variance could be underestimated. The purpose of this thesis is to study the appropriateness of repeating K-fold cross validation. We first investigate whether the accuracy estimates obtained from the repetitions of K-fold cross validation can be assumed to be independent. K-Nearest Neighbor algorithm with K = 1 is used to analyze the dependency relationships among the predictions of two repetitions of K-fold cross validation. Statistical methods are also proposed to test the strength of the dependency relationships. The experimental results on twenty data sets show that the predictions in two repetitions of K-fold cross validation are generally highly correlated, and the correlation will be higher as the number of folds increases. The results of a simulation study suggest that the K-fold cross validation with a small number of repetitions and a large number of folds should be adopted.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "K-fold validation"

1

Torres-Sospedra, Joaquín, Carlos Hernández-Espinosa, and Mercedes Fernández-Redondo. "Improving Adaptive Boosting with k-Cross-Fold Validation." In Lecture Notes in Computer Science, 397–402. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11816157_46.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Chowdhury, Pinaki Roy, and K. K. Shukla. "On Generalization and K-Fold Cross Validation Performance of MLP Trained with EBPDT." In Advances in Soft Computing — AFSS 2002, 352–59. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002. http://dx.doi.org/10.1007/3-540-45631-7_47.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Jiang, Ping, Zhigang Zeng, Jiejie Chen, and Tingwen Huang. "Generalized Regression Neural Networks with K-Fold Cross-Validation for Displacement of Landslide Forecasting." In Advances in Neural Networks – ISNN 2014, 533–41. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-12436-0_59.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Dahliyusmanto, Tutut Herawan, Syefrida Yulina, and Abdul Hanan Abdullah. "A Feature Selection Algorithm for Anomaly Detection in Grid Environment Using k-fold Cross Validation Technique." In Advances in Intelligent Systems and Computing, 619–30. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-51281-5_62.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Munna, Md Tahsir Ahmed, Mirza Mohtashim Alam, Shaikh Muhammad Allayear, Kaushik Sarker, and Sheikh Joly Ferdaus Ara. "Prediction Model for Prevalence of Type-2 Diabetes Complications with ANN Approach Combining with K-Fold Cross Validation and K-Means Clustering." In Advances in Intelligent Systems and Computing, 451–67. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-03402-3_31.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Munna, Md Tahsir Ahmed, Mirza Mohtashim Alam, Shaikh Muhammad Allayear, Kaushik Sarker, and Sheikh Joly Ferdaus Ara. "Prediction Model for Prevalence of Type-2 Diabetes Complications with ANN Approach Combining with K-Fold Cross Validation and K-Means Clustering." In Lecture Notes in Networks and Systems, 1031–45. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-12388-8_71.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Chen, Jiejie, Ping Jiang, Zhigang Zeng, and Boshan Chen. "R-RTRL Based on Recurrent Neural Network with K-Fold Cross-Validation for Multi-step-ahead Prediction Landslide Displacement." In Advances in Neural Networks – ISNN 2018, 468–75. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-92537-0_54.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Wally, Youssef, Yara Samaha, Ziad Yasser, Steffen Walter, and Friedhelm Schwenker. "Personalized k-fold Cross-Validation Analysis with Transfer from Phasic to Tonic Pain Recognition on X-ITE Pain Database." In Pattern Recognition. ICPR International Workshops and Challenges, 788–802. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-68780-9_59.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Matei, Alexander, and Stefan Ulbrich. "Detection of Model Uncertainty in the Dynamic Linear-Elastic Model of Vibrations in a Truss." In Lecture Notes in Mechanical Engineering, 281–95. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-77256-7_22.

Full text

Abstract:

AbstractDynamic processes have always been of profound interest for scientists and engineers alike. Often, the mathematical models used to describe and predict time-variant phenomena are uncertain in the sense that governing relations between model parameters, state variables and the time domain are incomplete. In this paper we adopt a recently proposed algorithm for the detection of model uncertainty and apply it to dynamic models. This algorithm combines parameter estimation, optimum experimental design and classical hypothesis testing within a probabilistic frequentist framework. The best setup of an experiment is defined by optimal sensor positions and optimal input configurations which both are the solution of a PDE-constrained optimization problem. The data collected by this optimized experiment then leads to variance-minimal parameter estimates. We develop efficient adjoint-based methods to solve this optimization problem with SQP-type solvers. The crucial test which a model has to pass is conducted over the claimed true values of the model parameters which are estimated from pairwise distinct data sets. For this hypothesis test, we divide the data into k equally-sized parts and follow a k-fold cross-validation procedure. We demonstrate the usefulness of our approach in simulated experiments with a vibrating linear-elastic truss.

APA, Harvard, Vancouver, ISO, and other styles

10

Ali, ABM Shawkat. "K-means Clustering Adopting rbf-Kernel." In Data Mining and Knowledge Discovery Technologies, 118–42. IGI Global, 2008. http://dx.doi.org/10.4018/978-1-59904-960-1.ch006.

Full text

Abstract:

Clustering technique in data mining has received a significant amount of attention from machine learning community in the last few years as one of the fundamental research area. Among the vast range of clustering algorithm, K-means is one of the most popular clustering algorithm. In this research we extend K-means algorithm by adding well known radial basis function (rbf) kernel and find better performance than classical K-means algorithm. It is a critical issue for rbf kernel, how can we select a unique parameter for optimum clustering task. This present chapter will provide a statistical based solution on this issue. The best parameter selection is considered on the basis of prior information of the data by Maximum Likelihood (ML) method and Nelder-Mead (N-M) simplex method. A rule based meta-learning approach is then proposed for automatic rbf kernel parameter selection.We consider 112 supervised data set and measure the statistical data characteristics using basic statistics, central tendency measure and entropy based approach. We split this data characteristics using well known decision tree approach to generate the rules. Finally we use the generated rules to select the unique parameter value for rbf kernel and then adopt in K-means algorithm. The experiment has been demonstrated with 112 problems and 10 fold cross validation methods. Finally the proposed algorithm can solve any clustering task very quickly with optimum performance.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "K-fold validation"

1

dos Santos, Priscila G. M., Ismael C. S. Araujo, Rodrigo S. Sousa, and Adenilton J. da Silva. "Quantum Enhanced k-fold Cross-Validation." In 2018 7th Brazilian Conference on Intelligent Systems (BRACIS). IEEE, 2018. http://dx.doi.org/10.1109/bracis.2018.00041.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Juda, P., P. Renard, and J. Straubhaar. "K-fold Cross-validation of Multiple-point Statistical Simulations." In Petroleum Geostatistics 2019. European Association of Geoscientists & Engineers, 2019. http://dx.doi.org/10.3997/2214-4609.201902239.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Alippi, Cesare, and Manuel Roveri. "Virtual k-fold cross validation: An effective method for accuracy assessment." In 2010 International Joint Conference on Neural Networks (IJCNN). IEEE, 2010. http://dx.doi.org/10.1109/ijcnn.2010.5596899.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Nie, Yali, Laura De Santis, Marco Carratu, Mattias O'Nils, Paolo Sommella, and Jan Lundgren. "Deep Melanoma classification with K-Fold Cross-Validation for Process optimization." In 2020 IEEE International Symposium on Medical Measurements and Applications (MeMeA). IEEE, 2020. http://dx.doi.org/10.1109/memea49120.2020.9137222.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Mubang, Fred. "Using K-Fold Cross Validation and ResNet Ensembles to Predict Cooking States." In State Recognition symposium. RPAL, 2019. http://dx.doi.org/10.32555/2019.dl.017.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Yadav, Sanjay, and Sanyam Shukla. "Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for Quality Classification." In 2016 IEEE 6th International Conference on Advanced Computing (IACC). IEEE, 2016. http://dx.doi.org/10.1109/iacc.2016.25.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Karal, Omer. "Performance comparison of different kernel functions in SVM for different k value in k-fold cross-validation." In 2020 Innovations in Intelligent Systems and Applications Conference (ASYU). IEEE, 2020. http://dx.doi.org/10.1109/asyu50717.2020.9259880.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Caon, Daniel R. S., Asmaa Amehraye, Joseph Razik, Gerard Chollet, Rodrigo V. Andreao, and Chafic Mokbel. "Experiments on acoustic model supervised adaptation and evaluation by K-Fold Cross Validation technique." In 2010 5th International Symposium On I/V Communications and Mobile Network (ISVC). IEEE, 2010. http://dx.doi.org/10.1109/isvc.2010.5656264.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Tamilarasi, P., and R. Uma Rani. "Diagnosis of Crime Rate against Women using k-fold Cross Validation through Machine Learning." In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC). IEEE, 2020. http://dx.doi.org/10.1109/iccmc48092.2020.iccmc-000193.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Hu, Chao, Byeng D. Youn, and Pingfeng Wang. "Ensemble of Data-Driven Prognostic Algorithms With Weight Optimization and K-Fold Cross Validation." In ASME 2010 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. ASMEDC, 2010. http://dx.doi.org/10.1115/detc2010-29182.

Full text

Abstract:

The traditional data-driven prognostic approach is to construct multiple candidate algorithms using a training data set, evaluate their respective performance using a testing data set, and select the one with the best performance while discarding all the others. This approach has three shortcomings: (i) the selected standalone algorithm may not be robust, i.e., it may be less accurate when the real data acquired after the deployment differs from the testing data; (ii) it wastes the resources for constructing the algorithms that are discarded in the deployment; (iii) it requires the testing data in addition to the training data, which increases the overall expenses for the algorithm selection. To overcome these drawbacks, this paper proposes an ensemble data-driven prognostic approach which combines multiple member algorithms with a weighted-sum formulation. Three weighting schemes, namely, the accuracy-based weighting, diversity-based weighting and optimization-based weighting, are proposed to determine the weights of member algorithms for data-driven prognostics. The k-fold cross validation (CV) is employed to estimate the prediction error required by the weighting schemes. Two case studies were employed to demonstrate the effectiveness of the proposed prognostic approach. The results suggest that the ensemble approach with any weighting scheme gives more accurate RUL predictions compared to any sole algorithm and that the optimization-based weighting scheme gives the best overall performance among the three weighting schemes.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!