To see the other types of publications on this topic, follow the link: Kaggle.

Journal articles on the topic 'Kaggle'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Kaggle.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Сикулер, Денис Валерьевич. "RESOURCES PROVIDING DATA FOR MACHINE LEARNING AND TESTING ARTIFICIAL INTELLIGENCE TECHNOLOGIES." Информационные и математические технологии в науке и управлении, no. 2(22) (June 25, 2021): 39–52. http://dx.doi.org/10.38028/esi.2021.22.2.004.

Full text
Abstract:
В статье выполнен обзор 10 ресурсов сети Интернет, позволяющих подобрать данные для разнообразных задач, связанных с машинным обучением и искусственным интеллектом. Рассмотрены как широко известные сайты (например, Kaggle, Registry of Open Data on AWS), так и менее популярные или узкоспециализированные ресурсы (к примеру, The Big Bad NLP Database, Common Crawl). Все ресурсы предоставляют бесплатный доступ к данным, в большинстве случаев для этого даже не требуется регистрация. Для каждого ресурса указаны характеристики и особенности, касающиеся поиска и получения наборов данных. В работе представлены следующие сайты: Kaggle, Google Research, Microsoft Research Open Data, Registry of Open Data on AWS, Harvard Dataverse Repository, Zenodo, Портал открытых данных Российской Федерации, World Bank, The Big Bad NLP Database, Common Crawl. The work presents review of 10 Internet resources that can be used to find data for different tasks related to machine learning and artificial intelligence. There were examined some popular sites (like Kaggle, Registry of Open Data on AWS) and some less known and specific ones (like The Big Bad NLP Database, Common Crawl). All included resources provide free access to data. Moreover in most cases registration is not needed for data access. Main features are specified for every examined resource, including regarding data search and access. The following sites are included in the review: Kaggle, Google Research, Microsoft Research Open Data, Registry of Open Data on AWS, Harvard Dataverse Repository, Zenodo, Open Data portal of the Russian Federation, World Bank, The Big Bad NLP Database, Common Crawl.
APA, Harvard, Vancouver, ISO, and other styles
2

Al-Taie, Mohammed Zuhair, Naomie Salim, and Adekunle Isiaka Obasa. "Successful Data Science Projects: Lessons Learned from Kaggle Competition." Kurdistan Journal of Applied Research 2, no. 3 (August 27, 2017): 40–49. http://dx.doi.org/10.24017/science.2017.3.18.

Full text
Abstract:
The workflow from data understanding to deployment of an analytical model of a data science project begins at framing the problem at hand, a task that is typically business-oriented and requires human-to-human interaction. However, the next three steps: data understanding, feature extraction, and model building that come next in the pipeline are the key to successful data science projects. Failing to fully understand the requirements of each of these three steps can negatively affect the performance of the proposed system. Hence, the current study tries to answer the following question “What are the requirements of a successful data science project?” To answer this question, we will use the solution that we built to measure the relevance of local search results of small online e-businesses and submitted to Kaggle data science platform to shed light on why our solution did not achieve a top position among other competitors. Evaluation of the design that we submitted to the competition is going to be carried out in the spirit of the three winning submissions. Our results revealed that well-performed data preprocessing, well-defined features, and model ensembling are critical for building successful data science projects. Such a clarification provides insight into specific aspects of model design to help others including Kagglers avoid possible mistakes while approaching their data science projects.
APA, Harvard, Vancouver, ISO, and other styles
3

Lee, Chaehyeon, Jaehyeop Choi, and Heechul Jung. "Deep Learning-based Bengali Handwritten Grapheme Classification for Kaggle Bengali.AI Challenge." Journal of the Institute of Electronics and Information Engineers 57, no. 9 (September 30, 2020): 67–76. http://dx.doi.org/10.5573/ieie.2020.57.9.67.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Jinfeng, Gao, Sehrish Qummar, Zhang Junming, Yao Ruxian, and Fiaz Gul Khan. "Ensemble Framework of Deep CNNs for Diabetic Retinopathy Detection." Computational Intelligence and Neuroscience 2020 (December 15, 2020): 1–11. http://dx.doi.org/10.1155/2020/8864698.

Full text
Abstract:
Diabetic retinopathy (DR) is an eye disease that damages the blood vessels of the eye. DR causes blurred vision or it may lead to blindness if it is not detected in early stages. DR has five stages, i.e., 0 normal, 1 mild, 2 moderate, 3 severe, and 4 PDR. Conventionally, many hand-on projects of computer vision have been applied to detect DR but cannot code the intricate underlying features. Therefore, they result in poor classification of DR stages, particularly for early stages. In this research, two deep CNN models were proposed with an ensemble technique to detect all the stages of DR by using balanced and imbalanced datasets. The models were trained with Kaggle dataset on a high-end Graphical Processing data. Balanced dataset was used to train both models, and we test these models with balanced and imbalanced datasets. The result shows that the proposed models detect all the stages of DR unlike the current methods and perform better compared to state-of-the-art methods on the same Kaggle dataset.
APA, Harvard, Vancouver, ISO, and other styles
5

Assegie, Tsehay Admassu, R. Lakshmi Tulasi, and N. Komal Kumar. "Breast cancer prediction model with decision tree and adaptive boosting." IAES International Journal of Artificial Intelligence (IJ-AI) 10, no. 1 (March 1, 2021): 184. http://dx.doi.org/10.11591/ijai.v10.i1.pp184-190.

Full text
Abstract:
In this study, breast cancer prediction model is proposed with decision tree and adaptive boosting (Adboost). Furthermore, an extensive experimental evaluation of the predictive performance of the proposed model is conducted. The study is conducted on breast cancer dataset collected form the kaggle data repository. The dataset consists of 569 observations of which the 212 or 37.25% are benign or breast cancer negative and 62.74% are malignant or breast cancer positive. The class distribution shows that, the dataset is highly imbalanced and a learning algorithm such as decision tree is biased to the benign observation and results in poor performance on predicting the malignant observation. To improve the performance of the decision tree on the malignant observation, boosting algorithm namely, the adaptive boosting is employed. Finally, the predictive performance of the decision tree and adaptive boosting is analyzed. The analysis on predictive performance of the model on the kaggle breast cancer data repository shows that, adaptive boosting has 92.53% accuracy and the accuracy of decision tree is 88.80%, Overall, the adaboost algorithm performed better than decision tree.
APA, Harvard, Vancouver, ISO, and other styles
6

Bratholm, Lars A., Will Gerrard, Brandon Anderson, Shaojie Bai, Sunghwan Choi, Lam Dang, Pavel Hanchar, et al. "A community-powered search of machine learning strategy space to find NMR property prediction models." PLOS ONE 16, no. 7 (July 20, 2021): e0253612. http://dx.doi.org/10.1371/journal.pone.0253612.

Full text
Abstract:
The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published ‘in-house’ efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties.
APA, Harvard, Vancouver, ISO, and other styles
7

Han, Gyeong Jin, and Keuntae Cho. "PLS Path Modeling to Investigate the Relations between Competencies of Data Scientist and Big Data Analysis Performance : Focused on Kaggle Platform." Journal of Korean Institute of Industrial Engineers 42, no. 2 (April 15, 2016): 112–21. http://dx.doi.org/10.7232/jkiie.2016.42.2.112.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kiehn, Moritz, Sabrina Amrouche, Paolo Calafiura, Victor Estrade, Steven Farrell, Cécile Germain, Vava Gligorov, et al. "The TrackML high-energy physics tracking challenge on Kaggle." EPJ Web of Conferences 214 (2019): 06037. http://dx.doi.org/10.1051/epjconf/201921406037.

Full text
Abstract:
The High-Luminosity LHC (HL-LHC) is expected to reach unprecedented collision intensities, which in turn will greatly increase the complexity of tracking within the event reconstruction. To reach out to computer science specialists, a tracking machine learning challenge (TrackML) was set up on Kaggle by a team of ATLAS, CMS, and LHCb physicists tracking experts and computer scientists building on the experience of the successful Higgs Machine Learning challenge in 2014. A training dataset based on a simulation of a generic HL-LHC experiment tracker has been created, listing for each event the measured 3D points, and the list of 3D points associated to a true track.The participants to the challenge should find the tracks in the test dataset, which means building the list of 3D points belonging to each track.The emphasis is to expose innovative approaches, rather than hyper-optimising known approaches. A metric reflecting the accuracy of a model at finding the proper associations that matter most to physics analysis will allow to select good candidates to augment or replace existing algorithms.
APA, Harvard, Vancouver, ISO, and other styles
9

Carpita, Maurizio, Enrico Ciavolino, and Paola Pasca. "Exploring and modelling team performances of the Kaggle European Soccer database." Statistical Modelling 19, no. 1 (January 10, 2019): 74–101. http://dx.doi.org/10.1177/1471082x18810971.

Full text
Abstract:
This study explores a big and open database of soccer leagues in 10 European countries. Data related to players, teams and matches covering seven seasons (from 2009/2010 to 2015/2016) were retrieved from Kaggle, an online platform in which big data are available for predictive modelling and analytics competition among data scientists. Based on both preliminary data analysis, experts’ evaluation and players’ position on the football pitch, role-based indicators of teams’ performance have been built and used to estimate the win probability of the home team with the binomial logistic regression (BLR) model that has been extended including the ELO rating predictor and two random effects due to the hierarchical structure of the dataset. The predictive power of the BLR model and its extensions has been compared with the one of other statistical modelling approaches (Random Forest, Neural Network, k-NN, Naïve Bayes). Results showed that role-based indicators substantially improved the performance of all the models used in both this work and in previous works available on Kaggle. The base BLR model increased prediction accuracy by 10 percentage points, and showed the importance of defence performances, especially in the last seasons. Inclusion of both ELO rating predictor and the random effects did not substantially improve prediction, as the simpler BLR model performed equally good. With respect to the other models, only Naïve Bayes showed more balanced results in predicting both win and no-win of the home team.
APA, Harvard, Vancouver, ISO, and other styles
10

Ben Taieb, Souhaib, and Rob J. Hyndman. "A gradient boosting approach to the Kaggle load forecasting competition." International Journal of Forecasting 30, no. 2 (April 2014): 382–94. http://dx.doi.org/10.1016/j.ijforecast.2013.07.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Maier, Torsten, Joanna DeFranco, and Christopher Mccomb. "An analysis of design process and performance in distributed data science teams." Team Performance Management: An International Journal 25, no. 7/8 (October 14, 2019): 419–39. http://dx.doi.org/10.1108/tpm-03-2019-0024.

Full text
Abstract:
PurposeOften, it is assumed that teams are better at solving problems than individuals working independently. However, recent work in engineering, design and psychology contradicts this assumption. This study aims to examine the behavior of teams engaged in data science competitions. Crowdsourced competitions have seen increased use for software development and data science, and platforms often encourage teamwork between participants.Design/methodology/approachWe specifically examine the teams participating in data science competitions hosted by Kaggle. We analyze the data provided by Kaggle to compare the effect of team size and interaction frequency on team performance. We also contextualize these results through a semantic analysis.FindingsThis work demonstrates that groups of individuals working independently may outperform interacting teams on average, but that small, interacting teams are more likely to win competitions. The semantic analysis revealed differences in forum participation, verb usage and pronoun usage when comparing top- and bottom-performing teams.Research limitations/implicationsThese results reveal a perplexing tension that must be explored further: true teams may experience better performance with higher cohesion, but nominal teams may perform even better on average with essentially no cohesion. Limitations of this research include not factoring in team member experience level and reliance on extant data.Originality/valueThese results are potentially of use to designers of crowdsourced data science competitions as well as managers and contributors to distributed software development projects.
APA, Harvard, Vancouver, ISO, and other styles
12

Kyu Park, Yong, Kyung Shin Kim, Jang Il Kim, Sung Hee Kim, and Kil Hung Lee. "A proposals of convolution neural network system for malicious code analysis based on cloud systems." International Journal of Engineering & Technology 7, no. 2.12 (April 3, 2018): 80. http://dx.doi.org/10.14419/ijet.v7i2.12.11040.

Full text
Abstract:
Background/Objectives: In the information security field, artificial intelligence must be applied first. This is because the frequency of malicious code is too high and the processing method is too difficult, which is very difficult for human to handle.Methods/Statistical analysis: In this paper, we developed a program to classify malicious codes into images and a Tensorflow system to classify malicious codes. The malware used as input was the computer virus code used in the BIG 2015 Challenge. This dataset, called a Kaggle dataset, consists of 10,868 bytes of train set.Findings: We used the Tensorflow SLIM library to develop this machine learning malware learning machine. This resulted in more than 80% accuracy. Especially, when the CRIS-Ensemble algorithm was added, the accuracy was 97%. The study of malicious code analysis using machine learning consists of two major parts. First, the process of making the virus into images is important. To classify 10,868 Kaggle malware datasets that the BIG 2015 winner showed 99.6% accuracy, Tensorflow's accuracy and parameter tuning are important, but finding the way to make good images is the most important techniqueImprovements/Applications: The results show that the malicious code classification system using machine learning can be an effective method to classify malicious code of malicious code by the accuracy of the result and ease of use.
APA, Harvard, Vancouver, ISO, and other styles
13

Putra, Muhammad Reza, and Azuraliza Abu Bakar. "Data Preprocessing: Case Study on monthly number of visitors to Taiwan by their residence and purpose." Jurnal KomtekInfo 7, no. 1 (January 14, 2020): 1–14. http://dx.doi.org/10.35134/komtekinfo.v7i1.60.

Full text
Abstract:
This paper will explain in details on data reports preliminary on dataset, how the pre-processing data mainly for data cleaning and reduction process applied to a dataset. The dataset that will be used is number of visitors to Taiwan by their residence and purpose.Dataset which is obtained based on kaggle, findings from Scraped from Taiwan Tourism Bureau. The surveys have been carried out using Foreign visitor data covers all foreign visitors directly arrived in Taiwan through the airports, ports and land.
APA, Harvard, Vancouver, ISO, and other styles
14

Nafi'iyah, Nur, Rizki Ardhian Ahmad, and Siti Mujilahwati. "Prediksi Nilai Calon Mahasiswa dengan Algoritma Backpropagation (Studi Kasus: Data Kaggle)." Jurnal Nasional Komputasi dan Teknologi Informasi (JNKTI) 3, no. 1 (April 29, 2020): 9–17. http://dx.doi.org/10.32672/jnkti.v3i1.1945.

Full text
Abstract:
Mahasiswa yang akan melakukan pendaftaran ke perguruan tinggi, baik ke jenjang sarjana atau pascasarjana pasti harus diseleksi. Proses seleksi dengan tes dan serangkaian kegiatan lainnya. Nilai-nilai tes tersebut kemudian dianalisa untuk mengetahui apakah mahasiswa layak diterima atau tidak. Beberapa perguruan tinggi di Amerika Serikat atau Inggris melakukan serangkaian tes mulai tes akademik, tes bahasa Inggris dan kemampuan meneliti. Dari beberapa data hasil seleksi atau ujian dapat digunakan untuk memprediksi calon mahasiswa baru yang akan masuk perguruan tinggi. Tujuan penelitian ini adalah memprediksi nilai calan mahasiswa yang akan masuk di perguruan tinggi. Studi kasus ini mengambil dari data kaggle, yang akan diprediksi dengan menggunakan algoritma backpropagation. Variabel yang menjadi input adalah GRE score, TOEFL score, Universiy rating, SOP, LOR, GPA, Research. Output dari prediksi nilai calon mahasiswa dalam angka. Proses training backpropagation menggunakan toole Matlab dengan arsitektur jaringan 2 model. Model ke-1 menggunakan 7-5-1 dengan hasil MSE 0,00272. Model ke-2 menggunakan 7-4-1 dengan hasil MSE 0,0029.
APA, Harvard, Vancouver, ISO, and other styles
15

Yu, Kun-Hsing, Tsung-Lu Michael Lee, Ming-Hsuan Yen, S. C. Kou, Bruce Rosen, Jung-Hsien Chiang, and Isaac S. Kohane. "Reproducible Machine Learning Methods for Lung Cancer Detection Using Computed Tomography Images: Algorithm Development and Validation." Journal of Medical Internet Research 22, no. 8 (August 5, 2020): e16709. http://dx.doi.org/10.2196/16709.

Full text
Abstract:
Background Chest computed tomography (CT) is crucial for the detection of lung cancer, and many automated CT evaluation methods have been proposed. Due to the divergent software dependencies of the reported approaches, the developed methods are rarely compared or reproduced. Objective The goal of the research was to generate reproducible machine learning modules for lung cancer detection and compare the approaches and performances of the award-winning algorithms developed in the Kaggle Data Science Bowl. Methods We obtained the source codes of all award-winning solutions of the Kaggle Data Science Bowl Challenge, where participants developed automated CT evaluation methods to detect lung cancer (training set n=1397, public test set n=198, final test set n=506). The performance of the algorithms was evaluated by the log-loss function, and the Spearman correlation coefficient of the performance in the public and final test sets was computed. Results Most solutions implemented distinct image preprocessing, segmentation, and classification modules. Variants of U-Net, VGGNet, and residual net were commonly used in nodule segmentation, and transfer learning was used in most of the classification algorithms. Substantial performance variations in the public and final test sets were observed (Spearman correlation coefficient = .39 among the top 10 teams). To ensure the reproducibility of results, we generated a Docker container for each of the top solutions. Conclusions We compared the award-winning algorithms for lung cancer detection and generated reproducible Docker images for the top solutions. Although convolutional neural networks achieved decent accuracy, there is plenty of room for improvement regarding model generalizability.
APA, Harvard, Vancouver, ISO, and other styles
16

Jagjeet Singh and Vibhor Sharma. "Movie Genre Prediction Based on Plot Synopsis." November 2020 6, no. 11 (November 23, 2020): 118–21. http://dx.doi.org/10.46501/ijmtst061121.

Full text
Abstract:
Movies have now become one of the main sources of entertainment for people. The extensive use of Internet has increased the creation and sharing of movie related data online. Movie plot summaries generally tell about the movie genres and many people read them before deciding to watch a movie. An automatic system can be applied to predict genres based on summaries. The objective dataset chosen by us consists of 14828 movies taken from Kaggle. We use different approaches such as TFIDF, Char gram, Skip gram etc to get better accuracy scores in predicting movie genre tags.
APA, Harvard, Vancouver, ISO, and other styles
17

Kumar, Akshi, Arunima Jaiswal, Shikhar Garg, Shobhit Verma, and Siddhant Kumar. "Sentiment Analysis Using Cuckoo Search for Optimized Feature Selection on Kaggle Tweets." International Journal of Information Retrieval Research 9, no. 1 (January 2019): 1–15. http://dx.doi.org/10.4018/ijirr.2019010101.

Full text
Abstract:
Selecting the optimal set of features to determine sentiment in online textual content is imperative for superior classification results. Optimal feature selection is computationally hard task and fosters the need for devising novel techniques to improve the classifier performance. In this work, the binary adaptation of cuckoo search (nature inspired, meta-heuristic algorithm) known as the Binary Cuckoo Search is proposed for the optimum feature selection for a sentiment analysis of textual online content. The baseline supervised learning techniques such as SVM, etc., have been firstly implemented with the traditional tf-idf model and then with the novel feature optimization model. Benchmark Kaggle dataset, which includes a collection of tweets is considered to report the results. The results are assessed on the basis of performance accuracy. Empirical analysis validates that the proposed implementation of a binary cuckoo search for feature selection optimization in a sentiment analysis task outperforms the elementary supervised algorithms based on the conventional tf-idf score.
APA, Harvard, Vancouver, ISO, and other styles
18

Assegie, Tsehay Admassu, and Pramod Sekharan Nair. "Handwritten digits recognition with decision tree classification: a machine learning approach." International Journal of Electrical and Computer Engineering (IJECE) 9, no. 5 (October 1, 2019): 4446. http://dx.doi.org/10.11591/ijece.v9i5.pp4446-4451.

Full text
Abstract:
Handwritten digits recognition is an area of machine learning, in which a machine is trained to identify handwritten digits. One method of achieving this is with decision tree classification model. A decision tree classification is a machine learning approach that uses the predefined labels from the past known sets to determine or predict the classes of the future data sets where the class labels are unknown. In this paper we have used the standard kaggle digits dataset for recognition of handwritten digits using a decision tree classification approach. And we have evaluated the accuracy of the model against each digit from 0 to 9.
APA, Harvard, Vancouver, ISO, and other styles
19

Nugroho, Budi, and Eva Yulia Puspaningrum. "Kinerja Metode CNN untuk Klasifikasi Pneumonia dengan Variasi Ukuran Citra Input." Jurnal Teknologi Informasi dan Ilmu Komputer 8, no. 3 (June 15, 2021): 533. http://dx.doi.org/10.25126/jtiik.2021834515.

Full text
Abstract:
<p class="Abstrak">Saat ini banyak dikembangkan proses pendeteksian pneumonia berdasarkan citra paru-paru dari hasil foto rontgen (x-ray), sebagaimana juga dilakukan pada penelitian ini. Metode yang digunakan adalah <em>Convolutional Neural Network</em> (CNN) dengan arsitektur yang berbeda dengan sejumlah penelitian sebelumnya. Selain itu, penelitian ini juga memodifikasi model CNN dimana metode <em>Extreme Learning Machine</em> (ELM) digunakan pada bagian klasifikasi, yang kemudian disebut CNN-ELM. Dataset untuk uji coba menggunakan kumpulan citra paru-paru hasil foto rontgen pada Kaggle yang terdiri atas 1.583 citra normal dan 4.237 citra pneumonia. Citra asal pada dataset kaggle ini bervariasi, tetapi hampir semua diatas ukuran 1000x1000 piksel. Ukuran citra yang besar ini dapat membuat pemrosesan klasifikasi kurang efektif, sehingga mesin CNN biasanya memodifikasi ukuran citra menjadi lebih kecil. Pada penelitian ini, pengujian dilakukan dengan variasi ukuran citra input, untuk mengetahui pengaruhnya terhadap kinerja mesin pengklasifikasi. Hasil uji coba menunjukkan bahwa ukuran citra input berpengaruh besar terhadap kinerja klasifikasi pneumonia, baik klasifikasi yang menggunakan metode CNN maupun CNN-ELM. Pada ukuran citra input 200x200, metode CNN dan CNN-ELM menunjukkan kinerja paling tinggi. Jika kinerja kedua metode itu dibandingkan, maka Metode CNN-ELM menunjukkan kinerja yang lebih baik daripada CNN pada semua skenario uji coba. Pada kondisi kinerja paling tinggi, selisih akurasi antara metode CNN-ELM dan CNN mencapai 8,81% dan selisih F1 Score mencapai 0,0729. Hasil penelitian ini memberikan informasi penting bahwa ukuran citra input memiliki pengaruh besar terhadap kinerja klasifikasi pneumonia, baik klasifikasi menggunakan metode CNN maupun CNN-ELM. Selain itu, pada semua ukuran citra input yang digunakan untuk proses klasifikasi, metode CNN-ELM menunjukkan kinerja yang lebih baik daripada metode CNN.</p><p class="Abstrak"> </p><p class="Abstrak"><em><strong>Abstract</strong></em></p><p class="Abstract"><em>This research developed a pneumonia detection machine based on the lungs' images from X-rays (x-rays). The method used is the Convolutional Neural Network (CNN) with a different architecture from some previous research. Also, the CNN model is modified, where the classification process uses the Extreme Learning Machine (ELM), which is then called the CNN-ELM method. The empirical experiments dataset used a collection of lung x-ray images on Kaggle consisting of 1,583 normal images and 4,237 pneumonia images. The original image's size on the Kaggle dataset varies, but almost all of the images are more than 1000x1000 pixels. For classification processing to be more effective, CNN machines usually use reduced-size images. In this research, experiments were carried out with various input image sizes to determine the effect on the classifier's performance. The experimental results show that the input images' size has a significant effect on the classification performance of pneumonia, both the CNN and CNN-ELM classification methods. At the 200x200 input image size, the CNN and CNN-ELM methods showed the highest performance. If the two methods' performance is compared, then the CNN-ELM Method shows better performance than CNN in all test scenarios. The difference in accuracy between the CNN-ELM and CNN methods reaches 8.81% at the highest performance conditions, and the difference in F1-Score reaches 0.0729. This research provides important information that the size of the input image has a major influence on the classification performance of pneumonia, both classification using the CNN and CNN-ELM methods. Also, on all input image sizes used for the classification process, the CNN-ELM method shows better performance than the CNN method.</em></p>
APA, Harvard, Vancouver, ISO, and other styles
20

Reddy, M. Srilekha. "Covid-19 Detection using Deep Learning." International Journal for Research in Applied Science and Engineering Technology 9, no. VI (June 30, 2021): 3835–540. http://dx.doi.org/10.22214/ijraset.2021.35813.

Full text
Abstract:
Recently, the virus (COVID-19) has spread widely throughout the world and has led to the examination of large numbers of suspected cases using standard COVID-19 tests and has become pandemic. Everyday life, public health and the global economy have been destroyed. The pathogenic laboratory tests such as Polymerase chain reaction (PCR) take a long time with false negative results and are considered the gold standard for diagnosis. Therefore, there was an urgent need for rapid and accurate diagnostic methods to detect COVID-19 cases as soon as possible to prevent the spread of this epidemic and combat it. Applying advanced artificial intelligence techniques along with radiography may be helpful in detecting this disease. In this study, we propose a classification model that detect the infected condition through the chest X-ray images. A dataset containing chest x-ray images of normal people, people with pneumonia such as SARS, streptococcus and pneumococcus and other patients with COVID- 19 were collected. Histogram of oriented gradients (HOG) is used for image features extraction. The images are then classified using Support Vector Machines (SVM), random forests and K- nearest neighbours (KNN), with classification rate 98.14%, 96.29% and 88.89% respectively. These results may contribute efficiently in detecting COVID-19 disease. The input dataset is taken from Kaggle which provides the dataset to analyse and helps to get the best possible solutions from the set of problems. Kaggle is launching a companion COVID-19 forecasting challenges to help answer a subset of the NASEM/WHO questions. While the challenge involves forecasting confirmed cases and fatalities between April 1 and April 30 by region, the primary goal isn't only to produce accurate forecasts. It’s also to identify factors that appear to impact the transmission rate of COVID-19.
APA, Harvard, Vancouver, ISO, and other styles
21

Soydaner, Derya. "A Comparison of Optimization Algorithms for Deep Learning." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 13 (April 30, 2020): 2052013. http://dx.doi.org/10.1142/s0218001420520138.

Full text
Abstract:
In recent years, we have witnessed the rise of deep learning. Deep neural networks have proved their success in many areas. However, the optimization of these networks has become more difficult as neural networks going deeper and datasets becoming bigger. Therefore, more advanced optimization algorithms have been proposed over the past years. In this study, widely used optimization algorithms for deep learning are examined in detail. To this end, these algorithms called adaptive gradient methods are implemented for both supervised and unsupervised tasks. The behavior of the algorithms during training and results on four image datasets, namely, MNIST, CIFAR-10, Kaggle Flowers and Labeled Faces in the Wild are compared by pointing out their differences against basic optimization algorithms.
APA, Harvard, Vancouver, ISO, and other styles
22

Alamsyah, Alamsyah, Budi Prasetiyo, M. Faris Al Hakim, and Fadli Dony Pradana. "Prediction of COVID-19 Using Recurrent Neural Network Model." Scientific Journal of Informatics 8, no. 1 (May 10, 2021): 98–103. http://dx.doi.org/10.15294/sji.v8i1.30070.

Full text
Abstract:
The COVID-19 case that infected humans was first discovered in China at the end of 2019. Since then, COVID-19 has spread to almost all countries in the world. To overcome this problem, it takes a quick effort to identify humans infected with COVID-19 more quickly. One of the alternative diagnoses for potential COVID-19 disease is Recurrent Neural Network (RNN). In this paper, RNN is implemented using the Elman network and applied to the COVID-19 dataset from Kaggle. The dataset consists of 70% training data and 30% test data. The learning parameters used were the maximum epoch, learning late, and hidden nodes. The research results show the percentage of accuracy is 88.
APA, Harvard, Vancouver, ISO, and other styles
23

Hadi, Sofian Wira, Muhammad Fahmi Julianto, Syaifur Rahmatullah, and Windu Gata. "ANALISA CLUSTER APLIKASI PADA APP STORE DENGAN MENGGUNAKAN METODE K-MEANS." Bianglala Informatika 8, no. 2 (October 1, 2020): 86–90. http://dx.doi.org/10.31294/bi.v8i2.8191.

Full text
Abstract:
Bagi para pengguna iphone, salah satu tempat untuk mengunduh ratusan ribu aplikasi android adalah App Store. Aplikasi-aplikasi iOS di bagi menjadi ketegori-ketegori yang unik. Di dalam aplikasi iOS ini terdapat aplikasi-aplikasi yang berbayar dan gratis. Dengan kategori tersebut pengguna bisa dengan mudah mencari aplikasi yang dibutuhkannya. Pada penelitian ini kami menggunakan metode K-Means untuk melihat ciri-ciri dari atribut yang ada. Dataset App Store diambil dari website resmi kaggle. Tujuan dari penelitian ini adalah untuk menganalisa hasil cluster dari K-Means. Hasil dari penelitian adalah adanya sebuah cluster yang memiliki ciri-ciri aplikasi yang ideal, yaitu nilai user rating tinggi, harga yang cukup lumayan dan memiliki ukuran aplikasi yang rendah.
APA, Harvard, Vancouver, ISO, and other styles
24

Yang, Xulei, and Jie Ding. "A Computational Framework for Iceberg and Ship Discrimination: Case Study on Kaggle Competition." IEEE Access 8 (2020): 82320–27. http://dx.doi.org/10.1109/access.2020.2990985.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Polak, Julia, and Dianne Cook. "A Study on Student Performance, Engagement, and Experience With Kaggle InClass data Challenges." Journal of Statistics and Data Science Education 29, no. 1 (January 2, 2021): 63–70. http://dx.doi.org/10.1080/10691898.2021.1892554.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

V T, Neethu. "Prediction of Stress and Mood using Neural Network, LSTM and Transfer learning." International Journal for Research in Applied Science and Engineering Technology 9, no. VI (June 30, 2021): 3136–42. http://dx.doi.org/10.22214/ijraset.2021.35347.

Full text
Abstract:
Stress can be a feeling of emotional or physical tension. It can come from any thought or event that makes you feel frustrated, disturbed, angry or nervous. It also affect the mood of the person. This study is conducting to predict the stress and mood based on heart rate variability which can be collected using Fitbit devices or Apple watches nowadays. In this work SWELL dataset available from the Kaggle repository is used. Neural Network and LSTM is used to predict the stress and mood. Predicting the stress is considered as first task and as mood prediction as second task. For second task prediction, the model created for first task is reused as pretrained model where we make use of transfer learning.
APA, Harvard, Vancouver, ISO, and other styles
27

J Alkhatib, Ahed, Amer Mahmoud Sindiani, and Eman Hussein Alshdaifat. "Insulin as a predictor of diabetes type 2: a new medical hypothesis." Advances in Obesity, Weight Management & Control 11, no. 1 (January 5, 2021): 1–3. http://dx.doi.org/10.15406/aowmc.2021.11.00328.

Full text
Abstract:
Since the discovery of diabetes, it is about insulin production, or function. In this study, we may introduce a controversial concept. Insulin as a predictor of diabetes, in other words, insulin can cause diabetes type 2. We think that this could serve as a new medical hypothesis. To examine this hypothesis, we analyzed dataset posted in Kaggle from India. The dataset included 763 female patients of whom 497 had no diabetes, and 266 with type 2 diabetes. We used routine statistical analysis and neural network analysis. The results showed that insulin level increases as the diabetes is progressed, and its relative contribution to diabetes was estimated as 28.4%. Taken together, insulin measurement is recommended to be considered in the management of diabetes.
APA, Harvard, Vancouver, ISO, and other styles
28

David, Etienne, Mario Serouart, Daniel Smith, Simon Madec, Kaaviya Velumani, Shouyang Liu, Xu Wang, et al. "Global Wheat Head Detection 2021: An Improved Dataset for Benchmarking Wheat Head Detection Methods." Plant Phenomics 2021 (September 22, 2021): 1–9. http://dx.doi.org/10.34133/2021/9846158.

Full text
Abstract:
The Global Wheat Head Detection (GWHD) dataset was created in 2020 and has assembled 193,634 labelled wheat heads from 4700 RGB images acquired from various acquisition platforms and 7 countries/institutions. With an associated competition hosted in Kaggle, GWHD_2020 has successfully attracted attention from both the computer vision and agricultural science communities. From this first experience, a few avenues for improvements have been identified regarding data size, head diversity, and label reliability. To address these issues, the 2020 dataset has been reexamined, relabeled, and complemented by adding 1722 images from 5 additional countries, allowing for 81,553 additional wheat heads. We now release in 2021 a new version of the Global Wheat Head Detection dataset, which is bigger, more diverse, and less noisy than the GWHD_2020 version.
APA, Harvard, Vancouver, ISO, and other styles
29

Deng, Wuhuan, and Eric Zhong. "Analysis and Prediction of Soccer Games: An Application to the Kaggle European Soccer Database." Insight - Statistics 3, no. 1 (November 11, 2020): 1. http://dx.doi.org/10.18282/i-s.v3i1.332.

Full text
Abstract:
<p>The study of soccer game data has many applications for both fans and teams. The effective analytical work can not only help the teams to improve their offensive and defensive skills and strategies, but also could assist the fans to make a bet. In this work, the authors study the European League Dataset with statistical methods to analyze the game data. Moreover, machine learning techniques are designed to predict the game results based on in-game performance and pre-game odds provided by bookmakers. With rational feature engineering and model selection, our model results in an overall 95% accuracy.</p>
APA, Harvard, Vancouver, ISO, and other styles
30

Bahaaaldeen Abdul wahhab, Ahmed, and Aliaa KareemabdulHassan. "Proposed Aspect Based Sentiment Analysis system for English reviews." Journal of Al-Qadisiyah for computer science and mathematics 11, no. 2 (August 27, 2019): 22–36. http://dx.doi.org/10.29304/jqcm.2019.11.2.559.

Full text
Abstract:
Reviews are a crucial source of opinions that may influence the decision in many areas. So there is a need for an algorithm that is efficient in understanding the aspects that the reviewers have focused on in their reviews and comments on social networks or other web applications. This paper submits a proposed approach for aspect-based sentiment analysis that consists of two steps; the firststep is by a proposedp_chunker algorithm for aspect extraction using Latent Dirchilet Analysis and noun phrase chunking, the second step is sentiment analysis using a proposed hybrid algorithm that depending on both lexicon and supervised sentiment analysis to specify the sentiment for extracted aspects. The proposed paradigm is tested using standard datasets from kaggle for both aspect extraction and sentiment analysis, the result show efficacy in the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
31

Abdulmunem, Ashwan A., Zinah Abdulridha Abutiheen, and Hiba J. Aleqabie. "Recognition of corona virus disease (COVID-19) using deep learning network." International Journal of Electrical and Computer Engineering (IJECE) 11, no. 1 (February 1, 2021): 365. http://dx.doi.org/10.11591/ijece.v11i1.pp365-374.

Full text
Abstract:
Corona virus disease (COVID-19) has an incredible influence in the last few months. It causes thousands of deaths in round the world. This make a rapid research movement to deal with this new virus. As a computer science, many technical researches have been done to tackle with it by using image processing algorithms. In this work, we introduce a method based on deep learning networks to classify COVID-19 based on x-ray images. Our results are encouraging to rely on to classify the infected people from the normal. We conduct our experiments on recent dataset, Kaggle dataset of COVID-19 X-ray images and using ResNet50 deep learning network with 5 and 10 folds cross validation. The experiments results show that 5 folds gives effective results than 10 folds with accuracy rate 97.28%.
APA, Harvard, Vancouver, ISO, and other styles
32

Castro-Bleda, M. J., S. España-Boquera, J. Pastor-Pellicer, and F. Zamora-Martínez. "The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing." Computer Journal 63, no. 11 (November 30, 2019): 1658–67. http://dx.doi.org/10.1093/comjnl/bxz098.

Full text
Abstract:
Abstract This paper presents the ‘NoisyOffice’ database. It consists of images of printed text documents with noise mainly caused by uncleanliness from a generic office, such as coffee stains and footprints on documents or folded and wrinkled sheets with degraded printed text. This corpus is intended to train and evaluate supervised learning methods for cleaning, binarization and enhancement of noisy images of grayscale text documents. As an example, several experiments of image enhancement and binarization are presented by using deep learning techniques. Also, double-resolution images are also provided for testing super-resolution methods. The corpus is freely available at UCI Machine Learning Repository. Finally, a challenge organized by Kaggle Inc. to denoise images, using the database, is described in order to show its suitability for benchmarking of image processing systems.
APA, Harvard, Vancouver, ISO, and other styles
33

Bhatt, Mittal, Vishal Dahiya, and Arvind K. Singh. "A Comparative Analysis of Classificaton methods for Diagnosis of Lower Back Pain." Oriental journal of computer science and technology 11, no. 2 (June 7, 2018): 135–39. http://dx.doi.org/10.13005/ojcst11.02.09.

Full text
Abstract:
In this paper different classification methods are compared using base and meta(Combination of Multiple Classifier for training) level classifiers, for the fruitful diagnosis of Lower Back Pain. The Lower Back Pain becomes chronic with age, so needs to be correctly diagnose with symptoms in the early age. Five independent classifiers were implemented at base level and meta level. At meta level, five combinations of different classifiers were implemented, using voting technique. According to the scores, the overall classification using Naïve Bayes and Multilayer Perceptron got the maximum efficiency 83.87%. The purpose of this paper is to diagnose healthy individuals efficiently. To carry out study the Lower Back Pain Symptoms Dataset is used from very famous platform for predictive modeling, Kaggle. The experiments were carried out in WEKA (Waikato Environment for Knowledge Analysis), suite of machine learning software1.
APA, Harvard, Vancouver, ISO, and other styles
34

Rokade, Prakash Pandharinath, and Aruna Kumari D. "Business recommendation based on collaborative filtering and feature engineering – aproposed approach." International Journal of Electrical and Computer Engineering (IJECE) 9, no. 4 (August 1, 2019): 2614. http://dx.doi.org/10.11591/ijece.v9i4.pp2614-2619.

Full text
Abstract:
Business decisions for any service or product depend on sentiments by people. We get these sentiments or rating on social websites like twitter, kaggle. The mood of people towards any event, service and product are expressed in these sentiments or rating. The text of sentiment contains different linguistic features of sentence. A sentiment sentence also contains other features which are playing a vital role in deciding the polarity of sentiments. If features selection is proper one can extract better sentiments for decision making. A directed preprocessing will feed filtered input to any machine learning approach. Feature based collaborative filtering can be used for better sentiment analysis. Better use of parts of speech (POS) followed by guided preprocessing and evaluation will minimize error for sentiment polarity and hence the better recommendation to the user for business analytics can be attained.
APA, Harvard, Vancouver, ISO, and other styles
35

Zuo, Qiang, Songyu Chen, and Zhifang Wang. "R2AU-Net: Attention Recurrent Residual Convolutional Neural Network for Multimodal Medical Image Segmentation." Security and Communication Networks 2021 (June 10, 2021): 1–10. http://dx.doi.org/10.1155/2021/6625688.

Full text
Abstract:
In recent years, semantic segmentation method based on deep learning provides advanced performance in medical image segmentation. As one of the typical segmentation networks, U-Net is successfully applied to multimodal medical image segmentation. A recurrent residual convolutional neural network with attention gate connection (R2AU-Net) based on U-Net is proposed in this paper. It enhances the capability of integrating contextual information by replacing basic convolutional units in U-Net by recurrent residual convolutional units. Furthermore, R2AU-Net adopts attention gates instead of the original skip connection. In this paper, the experiments are performed on three multimodal datasets: ISIC 2018, DRIVE, and public dataset used in LUNA and the Kaggle Data Science Bowl 2017. Experimental results show that R2AU-Net achieves much better performance than other improved U-Net algorithms for multimodal medical image segmentation.
APA, Harvard, Vancouver, ISO, and other styles
36

Noever, David A., and Samantha E. Miller Noever. "Deep Learning Classification Methods Applied to Tabular Cybersecurity Benchmarks." International Journal of Network Security & Its Applications 13, no. 03 (May 31, 2021): 1–13. http://dx.doi.org/10.5121/ijnsa.2021.13301.

Full text
Abstract:
This research recasts the network attack dataset from UNSW-NB15 as an intrusion detection problem in image space. Using one-hot-encodings, the resulting grayscale thumbnails provide a quarter-million examples for deep learning algorithms. Applying the MobileNetV2’s convolutional neural network architecture, the work demonstrates a 97% accuracy in distinguishing normal and attack traffic. Further class refinements to 9 individual attack families (exploits, worms, shellcodes) show an overall 54% accuracy. Using feature importance rank, a random forest solution on subsets shows the most important source-destination factors and the least important ones as mainly obscure protocols. It further extends the image classification problem to other cybersecurity benchmarks such as malware signatures extracted from binary headers, with an 80% overall accuracy to detect computer viruses as portable executable files (headers only). Both novel image datasets are available to the research community on Kaggle.
APA, Harvard, Vancouver, ISO, and other styles
37

MAGRANS DE ABRIL, Ildefons, and Masashi SUGIYAMA. "Winning the Kaggle Algorithmic Trading Challenge with the Composition of Many Models and Feature Engineering." IEICE Transactions on Information and Systems E96.D, no. 3 (2013): 742–45. http://dx.doi.org/10.1587/transinf.e96.d.742.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Bahad, Pritika, Preeti Saxena, and Raj Kamal. "Exploratory and Predictive Analytics of User Preferences from Kaggle LEGO-Toys Datasets Using Spark ML." IOP Conference Series: Materials Science and Engineering 1099, no. 1 (March 1, 2021): 012019. http://dx.doi.org/10.1088/1757-899x/1099/1/012019.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Feng, Kaicheng, and Xiaobing Liu. "Adaptive Attention with Consumer Sentinel for Movie Box Office Prediction." Complexity 2020 (December 7, 2020): 1–9. http://dx.doi.org/10.1155/2020/6689304.

Full text
Abstract:
To improve the movie box office prediction accuracy, this paper proposes an adaptive attention with consumer sentinel (LSTM-AACS) for movie box office prediction. First, the influencing factors of the movie box office are analyzed. Tackling the problem of ignoring consumer groups in existing prediction models, we add consumer features and then quantitatively analyze and normalize the box office influence factors. Second, we establish an LSTM (Long Short-Term Memory) box office prediction model and inject the attention mechanism to construct an adaptive attention with consumer sentinel for movie box office prediction. Finally, 10,398 pieces of movie box office dataset are used in the Kaggle competition to compare the prediction results with the LSTM-AACS model, LSTM-Attention model, and LSTM model. The results show that the relative error of LSTM-AACS prediction is 6.58%, which is lower than other models used in the experiment.
APA, Harvard, Vancouver, ISO, and other styles
40

K, Manikantha, Aishwarya R Bhat, Pavani Nerella, Pooja Baburaj, and Sharvari K S. "A Comparative Study of Transfer Learning Models for Offline Signature Verification and Forgery Detection." Journal of University of Shanghai for Science and Technology 23, no. 07 (July 23, 2021): 1129–39. http://dx.doi.org/10.51201/jusst/21/07272.

Full text
Abstract:
Recognising one’s identity to enter a system is called authentication. This process can take various forms where users input the system with a set of identifying credentials to access the system. Signatures belong to behavioural biometric, where the distinct features of every individual are considered in order to corroborate the person’s identity. The act of falsely imitating one’s signature biometric to impersonate and leverage access to their asset is called signature forgery. Our paper presents a comparative study of various deep learning models using Siamese architecture, over a wide catalogue of signature images. Openly available datasets like CEDAR, Handwritten Signatures dataset from Kaggle, ICDAR 2011 SigComp, and BH-Sig260 signature corpus are used to train the models. A set of classifiers – Support Vector Classifiers (SVC), Gaussian Naïve Bayes (GNB), Logistic Regression (LR) and K-Nearest Neighbours (KNN) are applied sequentially to classify the signature as genuine or forged.
APA, Harvard, Vancouver, ISO, and other styles
41

Mohd Noor, Farhan Nabil, Wan Hasbullah Mohd Isa, and Anwar P.P. Abdul Majeed. "The Diagnosis Of Diabetic Retinopathy By Means Of Transfer Learning With Conventional Machine Learning Pipeline." MEKATRONIKA 2, no. 2 (December 16, 2020): 62–67. http://dx.doi.org/10.15282/mekatronika.v2i2.6769.

Full text
Abstract:
Diabetic Retinopathy is one of the common eye diseases due to the complication of diabetes mellitus. Cotton wool spots, rough exudates, haemorrhages and microaneurysms are the symptoms of the diabetic retinopathy due to the fluid leakage that is caused by the high blood glucose level disorder. Early treatment to prevent a permanent blindness is important as it could save the diabetic retinopathy vision. Hence, in this study, we proposed to employ an automated detection method to diagnose the diabetic retinopathy. The dataset was obtained from the Kaggle Database and been divided for training, testing and validation purposes. Furthermore, Transfer Learning models, namely VGG19 were employed to extract the features before being processed by Machine Learning classifiers which are SVM, kNN and RF to classify the diabetic retinopathy. VGG19-SVM pipeline produced the best accuracy in training, testing and validation processes, achieving 99, 99 and 96 percents respectively.
APA, Harvard, Vancouver, ISO, and other styles
42

B, Chitluri Sai Harish, G. gnana krishna vamsi, G. jaya phani akhil, J. n. v. hari sravan, and V. mounika chowdary. "Prediction of Heart Stroke using A Novel Framework – PySpark." International Journal of Preventive Medicine and Health 1, no. 2 (May 10, 2021): 1–4. http://dx.doi.org/10.35940/ijpmh.b1002.051221.

Full text
Abstract:
Heart diseases are one of the most challenging problems faced by the Health Care sectors all over the world. These diseases are very basic now a days. With the expanding count of deaths because of heart illnesses, the necessity to build up a system to foresee heart ailments precisely. The work in this paper focuses on finding the best Machine Learning algorithm for identification of heart diseases. Our study compares the precision of three well known classification algorithms, Decision Tree and Naïve Bayes, Random Forest for the prediction of heart disease by making the use of dataset provided by Kaggle. We utilized various characteristics which relate with this heart diseases well, to find the better algorithm for prediction. The result of this study indicates that the Random Forest algorithm is the most efficient algorithm for prediction of heart disease with accuracy score of 97.17%.
APA, Harvard, Vancouver, ISO, and other styles
43

Xue, L., C. Liu, Y. Wu, and H. Li. "SEMANTIC SEGMENTATION OF CONVOLUTIONAL NEURAL NETWORK FOR SUPERVISED CLASSIFICATION OF MULTISPECTRAL REMOTE SENSING." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-3 (April 30, 2018): 2035–39. http://dx.doi.org/10.5194/isprs-archives-xlii-3-2035-2018.

Full text
Abstract:
Semantic segmentation is a fundamental research in remote sensing image processing. Because of the complex maritime environment, the classification of roads, vegetation, buildings and water from remote Sensing Imagery is a challenging task. Although the neural network has achieved excellent performance in semantic segmentation in the last years, there are a few of works using CNN for ground object segmentation and the results could be further improved. This paper used convolution neural network named U-Net, its structure has a contracting path and an expansive path to get high resolution output. In the network , We added BN layers, which is more conducive to the reverse pass. Moreover, after upsampling convolution , we add dropout layers to prevent overfitting. They are promoted to get more precise segmentation results. To verify this network architecture, we used a Kaggle dataset. Experimental results show that U-Net achieved good performance compared with other architectures, especially in high-resolution remote sensing imagery.
APA, Harvard, Vancouver, ISO, and other styles
44

Fu, Lifang, Xingchen Lv, Qiufeng Wu, and Chengyan Pei. "Field Weed Recognition Based on an Improved VGG With Inception Module." International Journal of Agricultural and Environmental Information Systems 11, no. 2 (April 2020): 1–13. http://dx.doi.org/10.4018/ijaeis.2020040101.

Full text
Abstract:
The precision spraying of herbicides can significantly reduce herbicide use, and recognizing different field weeds is an important part of it. In order to enhance the efficiency and accuracy of field weed recognition, this article proposed a field weed recognition algorithm based on VGG model called VGG Inception (VGGI). In this article, three optimizations were made. First, the reduced number of convolution layers to reduce parameters of network. Then, the Inception structure was added, which can maintain the main features, and have better classification accuracy. Finally, data augmentation and transfer learning methods were used to prevent the problem of over-fitting, and further enhance the field weed recognition effect. The Kaggle Images dataset was used in the experiment. This work achieved greater than 98% precision in the detection of field weeds. In actual field, the accuracy could reach 80%. It indicated that the VGGI model has an outstanding identification performance for seedling, and has significant potential for actual field weed recognition.
APA, Harvard, Vancouver, ISO, and other styles
45

Pardede, Jasman. "Deteksi Komentar Cyberbullying Pada Media Sosial Berbahasa Inggris Menggunakan Naïve Bayes Classification." Jurnal Informatika 7, no. 1 (April 6, 2020): 46–54. http://dx.doi.org/10.31311/ji.v7i1.6920.

Full text
Abstract:
Pesatnya perkembangan teknologi dan media sosial dapat memudahkan pengguna untuk menyampaikan informasi. Selain itu, media sosial juga memberikan dampak negatif dengan cara memposting tulisan kejam atau berkomentar semena-mena tanpa memikirkan akibat pada orang lain. Hal inilah yang menjadikan salah satu terjadinya tindak kekerasan dalam dunia maya (Cyberbullying). Tahapan awal yang dilakukan dalam penelitian ini adalah pengolahan bahasa atau yang disebut dengan text preprocessing meliputi tokenizing,casefolding, stopword removal dan stemming. Kemudian feature selection yaitu mengubah dokument teks menjadi matriks dengan tujuan untuk mendapatkan fitur pada setiap kata untuk dijadikan parameter atau kriteria klasifikasi. Untuk pengambilan keputusan apakah komentar mengandung makna bully atau nonbully menggunakan algoritma Naïve Bayes Classification dengan model multinomial naïve bayes. Perhitungan yang dilakukan adalah menghitung nilai probabilitas setiap kata yang muncul berdasarkan classdan nilai perkalian class conditional probability. Berdasarkan hasil eksperimen menggunakan dataset “cyberbullying comments” yang diambil dari Kaggle akurasi yang didapat sebesar 80%, precission 81% dan recall 80%.
APA, Harvard, Vancouver, ISO, and other styles
46

Postalcıoğlu, Seda. "Performance Analysis of Different Optimizers for Deep Learning-Based Image Recognition." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 02 (June 14, 2019): 2051003. http://dx.doi.org/10.1142/s0218001420510039.

Full text
Abstract:
Deep learning refers to Convolutional Neural Network (CNN). CNN is used for image recognition for this study. The dataset is named Fruits-360 and it is obtained from the Kaggle dataset. Seventy percent of the pictures are selected as training data and the rest of the images are used for testing. In this study, an image size is [Formula: see text]. Training is realized using Stochastic Gradient Descent with Momentum (sgdm), Adaptive Moment Estimation (adam) and Root Mean Square Propogation (rmsprop) techniques. The threshold value is determined as 98% for the training. When the accuracy reaches more than 98%, training is stopped. Calculation of the final validation accuracy is done using trained network. In this study, more than 98% of the predicted labels match the true labels of the validation set. Accuracies are calculated using test data for sgdm, adam and rmsprop techniques. The results are 98.08%, 98.85%, 98.88%, respectively. It is clear that fruits are recognized with good accuracy.
APA, Harvard, Vancouver, ISO, and other styles
47

Maurya, Roshankumar Ramashish, and Anand Khandare. "Enhance Clustering Algorithm Using Optimization." International Journal of Research in Engineering, Science and Management 3, no. 9 (September 28, 2020): 136–42. http://dx.doi.org/10.47607/ijresm.2020.313.

Full text
Abstract:
Unsupervised learning can reveal the structure of datasets without being concerned with any labels, K-means clustering is one such method. Traditionally the initial clusters have been selected randomly, with the idea that the algorithm will generate better clusters. However, studies have shown there are methods to improve this initial clustering as well as the K-means process. This paper examines these results on different types of datasets to study if these results hold for all types of data. Another method that is used for unsupervised clustering is the algorithm based on Particle Swarm Optimization. For the second part this paper studies the classic K-means based algorithm and a Hybrid K-means algorithm which uses PSO to improve the results from K-means. The hybrid K-means algorithms are compared to the standard K-means clustering on two benchmark classification problems. In this project we used Kaggle dataset to with different size (small, large and medium) for comparison PSO, k-means and k-means hybrid.
APA, Harvard, Vancouver, ISO, and other styles
48

Nikulin, Vladimir. "Prediction of the Shoppers Loyalty with Aggregated Data Streams." Journal of Artificial Intelligence and Soft Computing Research 6, no. 2 (April 1, 2016): 69–79. http://dx.doi.org/10.1515/jaiscr-2016-0007.

Full text
Abstract:
Abstract Consumer brands often offer discounts to attract new shoppers to buy their products. The most valuable customers are those who return after this initial incentive purchase. With enough purchase history, it is possible to predict which shoppers, when presented an offer, will buy a new item. While dealing with Big Data and with data streams in particular, it is a common practice to summarize or aggregate customers’ transaction history to the periods of few months. As an outcome, we compress the given huge volume of data, and transfer the data stream to the standard rectangular format. Consequently, we can explore a variety of practically or theoretically motivated tasks. For example, we can rank the given field of customers in accordance to their loyalty or intension to repurchase in the near future. This objective has very important practical application. It leads to preferential treatment of the right customers. We tested our model (with competitive results) online during Kaggle-based Acquire Valued Shoppers Challenge in 2014.
APA, Harvard, Vancouver, ISO, and other styles
49

Madhu, M. S., and Dr Kirupa Ganapathy. "Detection of Liver Disorder Using RBF SVM in Comparison with Naïve Bayes to Measure the Accuracy, Precision, Sensitivity and Specificity." Alinteri Journal of Agriculture Sciences 36, no. 1 (June 29, 2021): 657–64. http://dx.doi.org/10.47059/alinteri/v36i1/ajas21093.

Full text
Abstract:
Aim: Machine learning techniques are rapidly used in the area of medical research due to its impressive results in diagnosis and prediction of diseases. The objective of this study is to evaluate the performance of SVM classifier in identification of liver disorder by comparing it with Naive Bayes algorithm. Methods and Materials: A total of 31619 samples are collected from three liver disease datasets available in kaggle. These samples are divided into training dataset (n = 22133 [70%]) and test dataset (n = 9486 [30%]). Accuracy, precision, specificity and sensitivity values are calculated to quantify the performance of the SVM algorithm. Results: SVM achieved accuracy, precision, sensitivity and specificity of 73.64%, 97.82%, 97.56% and 69.77% respectively compared to 57.31%, 41.39%, 94.87% and 37.20% by Naive Bayes algorithm. Conclusion: In this study it is found that the RBF SVM algorithm performed better than the Naive Bayes algorithm in liver disorder detection of the datasets considered.
APA, Harvard, Vancouver, ISO, and other styles
50

Sadgali, Imane, Nawal Sael, and Fouazia Benabbou. "Bidirectional gated recurrent unit for improving classification in credit card fraud detection." Indonesian Journal of Electrical Engineering and Computer Science 21, no. 3 (March 10, 2021): 1704. http://dx.doi.org/10.11591/ijeecs.v21.i3.pp1704-1712.

Full text
Abstract:
<p>In recent years, the use of credit cards around the world has grown enormously. Thus, the number of fraud cases have also increased, resulting in losses of thousands of dollars worldwide. Therefore, it is mandatory to use techniques that are able to assist in the detection of credit card fraud. For this purpose, we have proposed a multi-level architecture, composed of four levels: authentication level, behavioral level, smart level and background processing level. In this paper, we focus on the implementation of the smart level. The aim of this level is to develop a classifier for the detection of credit card fraud, using bidirectional gated recurrent units (BGRU). The experiments, applied on well-known credit card fraud dataset from Kaggle, show that this model has peak performance compared to other proposed models, with 97.16% for accuracy rate and 99.66% for the area under the ROC curve.</p>
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography