To see the other types of publications on this topic, follow the link: Knowledge discovery model (KDM).

Journal articles on the topic 'Knowledge discovery model (KDM)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Knowledge discovery model (KDM).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Arcelli Fontana, Francesca, Claudia Raibulet, and Marco Zanoni. "Alternatives to the Knowledge Discovery Metamodel: An Investigation." International Journal of Software Engineering and Knowledge Engineering 27, no. 07 (2017): 1097–128. http://dx.doi.org/10.1142/s0218194017500413.

Full text
Abstract:
To better understand and exploit the knowledge necessary to comprehend and evolve an existing system, different models can be extracted from it. Models represent the extracted information at various abstraction levels, and are useful to document, maintain, and reengineer the system. The Knowledge Discovery Metamodel (KDM) has been defined by the object management group as a meta-model supporting a large share of reverse engineering activities. Its specification has also been adopted by the ISO in 2012. This paper explores and describes alternative meta-models proposed in the literature to support reverse engineering, program comprehension, and software evolution activities. We focus on the similarity and differences of the alternative meta-models with KDM, trying to understand the potentials of reciprocal information interchange. We describe KDM and other five meta-models, plus their extensions available in the literature and their diffusion in the reverse engineering community. We also investigate the approaches using KDM and the five meta-models. In the paper, we underline the limited reuse of models for reverse engineering, and identify potential directions for future related research, to enhance the existing models and ease the exchange of information among them.
APA, Harvard, Vancouver, ISO, and other styles
2

Amine, Moutaouakkil, and Mbarki Samir. "PHP modernization approach generating KDM models from PHP legacy code." Bulletin of Electrical Engineering and Informatics 9, no. 1 (2020): 247–55. https://doi.org/10.11591/eei.v9i1.1269.

Full text
Abstract:
With the rise of new web technologies such as web 2.0, Jquery, Bootstrap. Modernizing legacy web systems to benefit from the advantages of the new technologies is more and more relevant. The migration of a system from an environment to another is a time and effort consuming process; it involves a complete rewrite of the application adapted to the target platform. To realize this migration in an automated and standardized way, many approaches have tried to define standardized engineering processes. Architecture Driven Modernization (ADM) defines an approach to standardize and automate the reengineering process. We defined an ADM approach to represent PHP web applications in the highest level of abstraction models. To do this, we have used software artifacts as an entry point. This paper describes the extraction process, which permits discovering and understanding of the legacy system. Moreover, generate models to represent the system in an abstract way.
APA, Harvard, Vancouver, ISO, and other styles
3

Sharma, Sumana, and Kweku-Muata Osei-Bryson. "Toward an integrated knowledge discovery and data mining process model." Knowledge Engineering Review 25, no. 1 (2010): 49–67. http://dx.doi.org/10.1017/s0269888909990361.

Full text
Abstract:
AbstractThe knowledge discovery and data mining (KDDM) process models describe the various phases (e.g. business understanding, data understanding, data preparation, modeling, evaluation and deployment) of the KDDM process. They act as a roadmap for implementation of the KDDM process by presenting a list of tasks for executing the various phases. The checklist approach of describing the tasks is not adequately supported by appropriate tools, which specify ‘how’ the particular task can be implemented. This may result in tasks not being implemented. Another disadvantage is that the long checklist does not capture or leverage the dependencies that exist among the various tasks of the same and different phases. This not only makes the process cumbersome to implement, but also hinders possibilities for semi-automation of certain tasks. Given that each task in the process model serves an important goal and even affects the execution of related tasks due to the dependencies, these limitations are likely to negatively affect the efficiency and effectiveness of KDDM projects. This paper proposes an improved KDDM process model that overcomes these shortcomings by prescribing tools for supporting each task as well as identifying and leveraging dependencies among tasks for semi-automation of tasks, wherever possible.
APA, Harvard, Vancouver, ISO, and other styles
4

Deng, Jun, Jian Li, and Daoyao Wang. "Knowledge Discovery from Vibration Measurements." Scientific World Journal 2014 (2014): 1–15. http://dx.doi.org/10.1155/2014/917524.

Full text
Abstract:
The framework as well as the particular algorithms of pattern recognition process is widely adopted in structural health monitoring (SHM). However, as a part of the overall process of knowledge discovery from data bases (KDD), the results of pattern recognition are only changes and patterns of changes of data features. In this paper, based on the similarity between KDD and SHM and considering the particularity of SHM problems, a four-step framework of SHM is proposed which extends the final goal of SHM from detecting damages to extracting knowledge to facilitate decision making. The purposes and proper methods of each step of this framework are discussed. To demonstrate the proposed SHM framework, a specific SHM method which is composed by the second order structural parameter identification, statistical control chart analysis, and system reliability analysis is then presented. To examine the performance of this SHM method, real sensor data measured from a lab size steel bridge model structure are used. The developed four-step framework of SHM has the potential to clarify the process of SHM to facilitate the further development of SHM techniques.
APA, Harvard, Vancouver, ISO, and other styles
5

YANG, BINGRU, JIANGTAO SHEN, and WEI SONG. "KDK BASED DOUBLE-BASIS FUSION MECHANISM AND ITS PROCESS MODEL." International Journal on Artificial Intelligence Tools 14, no. 03 (2005): 399–423. http://dx.doi.org/10.1142/s021821300500217x.

Full text
Abstract:
Knowledge Discovery in Knowledge Base (KDK) opens new horizons for research. KDK and KDD (Knowledge Discovery in Database) are the different cognitive field and discovery process. In most people's view, they are independent each other. In this paper we can summarize the following tasks: Firstly, we discussed that two kinds of the process model and mining algorithm of KDK based on facts and rules in knowledge base. Secondly, we proves that the inherent relation between KDD and KDK (i.e. double-basis fusion mechanism). Thirdly, we gained the new process model and implementation technology of KDK*. Finally, the imitation experimentation proved that the validity of above mechanism and process model.
APA, Harvard, Vancouver, ISO, and other styles
6

Li, Jing Min, Jin Yao, and Yong Mou Liu. "A Model for Acquisition of Implicit Design Knowledge Based on KDD." Materials Science Forum 505-507 (January 2006): 505–10. http://dx.doi.org/10.4028/www.scientific.net/msf.505-507.505.

Full text
Abstract:
Knowledge discovery in database (KDD) represents a new direction of data processing and knowledge innovation. Design is a knowledge-intensive process driven by various design objectives. Implicit knowledge acquisition is key and difficult for the intelligent design system applied to mechanical product design. In this study, the characteristic of implicit design knowledge and KDD are analyzed, a model for product design knowledge acquisition is set up, and the key techniques including the expression and application of domain knowledge and the methods of knowledge discovery are discussed. It is illustrated by an example that the method proposed can be used to obtain the engineering knowledge in design case effectively, and can promote the quality and intelligent standard of product design.
APA, Harvard, Vancouver, ISO, and other styles
7

Zemmouri, EL Moukhtar, Hicham Behja, Abdelaziz Marzak, and Brigitte Trousse. "Ontology-Based Knowledge Model for Multi-View KDD Process." International Journal of Mobile Computing and Multimedia Communications 4, no. 3 (2012): 21–33. http://dx.doi.org/10.4018/jmcmc.2012070102.

Full text
Abstract:
Knowledge Discovery in Databases (KDD) is a highly complex, iterative and interactive process that involves several types of knowledge and expertise. In this paper the authors propose to support users of a multi-view analysis (a KDD process held by several experts who analyze the same data with different viewpoints). Their objective is to enhance both the reusability of the process and coordination between users. To do so, they propose a formalization of viewpoint in KDD and a Knowledge Model that structures domain knowledge involved in a multi-view analysis. The authors’ formalization, using OWL ontologies, of viewpoint notion is based on CRISP-DM standard through the identification of a set of generic criteria that characterize a viewpoint in KDD.
APA, Harvard, Vancouver, ISO, and other styles
8

Jahani, Alireza, Peyman Akhavan, Mostafa Jafari, and Mohammad Fathian. "Conceptual model for knowledge discovery process in databases based on multi-agent system." VINE Journal of Information and Knowledge Management Systems 46, no. 2 (2016): 207–31. http://dx.doi.org/10.1108/vjikms-01-2015-0003.

Full text
Abstract:
Purpose Knowledge discovery in databases (KDD) is a tedious and repetitive process. A challenge for the effective use of KDD is understanding and confirming its results derived from the harmonized process. To exploit the advantages of agents’ application, this paper aims to propose a conceptual model based on a multi-agent system (MAS) to control each step of the KDD process. Design/methodology/approach This paper reports the empirical findings of a survey conducted among academic and industrial sectors in Tehran, Iran. In this survey, the participants answered a questionnaire about the main factors of designing a suitable model for the KDD process based on MAS. The factor analysis reveals important insights of previous models developed by various researchers. Findings This research uses the survey results to find six critical success factors, continuity in refinement and improvement; learning and acting concurrently; loosely or tightly coupled approach for using technologies; cooperative, dynamic and flexible environment; documentation and reporting; and extracting and evaluating knowledge intelligently, for a proper conceptual model of the KDD process based on MAS. Research limitations/implications The proposed model reflects all aspects of the KDD process by applying the intelligent agents for each process steps. In addition, this research only considers the Iran society; hence, it cannot be generalized to other nations, and it may need further research in other countries and to be implemented in real-world business domains. Originality/value This research helps organizations to adopt a proposed model and implement a KDD process to advantage the valuable knowledge that exists in their data resources.
APA, Harvard, Vancouver, ISO, and other styles
9

Resell, Mathilde, Elisabeth Pimpisa Graarud, Hanne-Line Rabben, et al. "Knowledge Discovery in Databases of Proteomics by Systems Modeling in Translational Research on Pancreatic Cancer." Proteomes 13, no. 2 (2025): 20. https://doi.org/10.3390/proteomes13020020.

Full text
Abstract:
Background: Knowledge discovery in databases (KDD) can contribute to translational research, also known as translational medicine, by bridging the gap between in vitro and in vivo studies, and clinical applications. Here, we propose a ‘systems modeling’ workflow for KDD. Methods: This framework includes the data collection of a composition model (various research models), processing model (proteomics) and analytical model (bioinformatics, artificial intelligence/machine leaning and pattern evaluation), knowledge presentation, and feedback loops for hypothesis generation and validation. We applied this workflow to study pancreatic ductal adenocarcinoma (PDAC). Results: We identified the common proteins between human PDAC and various research models in vitro (cells, spheroids and organoids) and in vivo (mouse mice). Accordingly, we hypothesized potential translational targets on hub proteins and the related signaling pathways, PDAC-specific proteins and signature pathways, and high topological proteins. Conclusions: This systems modeling workflow can be a valuable method for KDD, facilitating knowledge discovery in translational targets in general, and in particular to PADA in this case.
APA, Harvard, Vancouver, ISO, and other styles
10

ZHONG, NING, CHUNNIAN LIU, and SETSUO OHSUGA. "DYNAMICALLY ORGANIZING KDD PROCESSES." International Journal of Pattern Recognition and Artificial Intelligence 15, no. 03 (2001): 451–73. http://dx.doi.org/10.1142/s0218001401000976.

Full text
Abstract:
How to increase both autonomy and versatility of a knowledge discovery system is a core problem and a crucial aspect of KDD (Knowledge Discovery and Data Mining). Within the framework of the KDD process and the GLS (Global Learning Scheme) system recently proposed by us, this paper describes a way of increasing both autonomy and versatility of a KDD system by dynamically organizing KDD processes. In our approach, the KDD process is modeled as an organized society of KDD agents with multiple levels. We propose an ontology to describe KDD agents, in the style of OOER (Object Oriented Entity Relationship) data model. Based on this ontology of KDD agents, we apply several AI planning techniques, which are implemented as a meta-agent, so that we might (1) solve the most difficult problem in a multiagent KDD system: how to automatically choose appropriate KDD techniques (KDD agents) to achieve a particular discovery goal in a particular application domain; (2) tackle the complexity of KDD process; and (3) support evolution of KDD data, knowledge and process. The GLS system, as a multistrategy and multiagent KDD system based on the methodology, increases both autonomy and versatility.
APA, Harvard, Vancouver, ISO, and other styles
11

Guo, Yu Dong. "Prototype System of Knowledge Management Based on Data Mining." Applied Mechanics and Materials 411-414 (September 2013): 251–54. http://dx.doi.org/10.4028/www.scientific.net/amm.411-414.251.

Full text
Abstract:
Knowledge is a very crucial resource to promote economic development and society progress which includes facts, information, descriptions, or skills acquired through experience or education. With knowledge has being increasingly prominent, knowledge management has become important measure for the core competences promotion of a corporation. The paper begins with knowledge managements definition, and studies the process of knowledge discovery from databases (KDD),data mining techniques and SECI(Socialization, Externalization, Combination, Internalization) model of knowledge dimensions. Finally, a simple knowledge management prototype system was proposed which based on the KDD and data mining.
APA, Harvard, Vancouver, ISO, and other styles
12

Jie, Hu, and Ji Long Yin. "Knowledge Discovery and Management from Numerical Simulation and its Application to Robust Optimization of Extrusion-Forging Processing." Key Engineering Materials 340-341 (June 2007): 659–64. http://dx.doi.org/10.4028/www.scientific.net/kem.340-341.659.

Full text
Abstract:
Numerical simulation technology has been used widely in plastic forming area. However, the simulation of increasingly complex forming process leads to the generation of vast quantities of data, which implies much useful knowledge. Consequently domain knowledge is very significant to product design and process development in metal plastic forming area. The paper presented a new robust optimization method based on knowledge discovery from numerical simulation. Firstly, the knowledge discovery model from numerical simulation is established. In this model, interval-based rule presentation is adopted to describe the uncertainty of design parameters quantitatively to enhance the design robustness. Secondly, the optimization process based on knowledge discovery and management is presented, and genetic arithmetic is used to obtain the robust optimization parameter. Finally, the application to robust optimization of extrusion-forging processing is analyzed to show the scheme to be effective. The proposed method can overcome the pathologies in simulation optimization and improve the efficiency & robustness in design optimization.
APA, Harvard, Vancouver, ISO, and other styles
13

KHOSHGOFTAAR, TAGHI M., EDWARD B. ALLEN, WENDELL D. JONES, and JOHN P. HUDEPOHL. "DATA MINING FOR PREDICTORS OF SOFTWARE QUALITY." International Journal of Software Engineering and Knowledge Engineering 09, no. 05 (1999): 547–63. http://dx.doi.org/10.1142/s0218194099000309.

Full text
Abstract:
"Knowledge discovery in data bases" (KDD) for software engineering is a process for finding useful information in the large volumes of data that are a byproduct of software development, such as data bases for configuration management and for problem reporting. This paper presents guidelines for extracting innovative process metrics from these commonly available data bases. This paper also adapts the Classification And Regression Trees algorithm, CART, to the KDD process for software engineering data. To our knowledge, this algorithm has not been used previously for empirical software quality modeling. In particular, we present an innovative way to control the balance between misclassification rates. A KDD case study of a very large legacy telecommunications software system found that variables derived from source code, configuration management transactions, and problem reporting transactions can be useful predictors of software quality. The KDD process discovered that for this software development environment, out of forty software attributes, only a few of the predictor variables were significant. This resulted in a model that predicts whether modules are likely to have faults discovered by customers. Software developers need such predictions early in development to target software enhancement techniques to the modules that need improvement the most.
APA, Harvard, Vancouver, ISO, and other styles
14

Ma, Ming Lei, Gui Ling Wang, Dong Mei Miao, and Gui Jun Xian. "Applying KDD to a Structure Health Monitoring System Based on a Real Sited Bridge: Model Reshaping Case." Applied Mechanics and Materials 472 (January 2014): 535–38. http://dx.doi.org/10.4028/www.scientific.net/amm.472.535.

Full text
Abstract:
Knowledge discovery (KDD) method aims to solve the problem of massive data. For bridge engineering, the structural health monitoring (SHM) system is cumulative data from time to time, but the whole system should be understudied in real time. Data mining should be used in one of the KDD process. This article proposed a regular rule of analyzing the SHM data from a real sited bridge. The data aim to help engineers understanding the system degradation of the bridge.
APA, Harvard, Vancouver, ISO, and other styles
15

Zarate, Luis, Bruno Petrocchi, Carlos Dias Maia, Caio Felix, and Marco Paulo Gomes. "CAPTO - A method for understanding problem domains for data science projects." Concilium 23, no. 15 (2023): 922–41. http://dx.doi.org/10.53660/clm-1815-23m33.

Full text
Abstract:
Data Science aims to infer knowledge from facts and evidence expressed from data. This occurs through a knowledge discovery process (KDD), which requires an understanding of the application domain. However, in practice, not enough time is spent on understanding this domain, and consequently, the extracted knowledge may not be correct or not relevant. Considering that understanding the problem is an essential step in the KDD process, this work proposes the CAPTO method for understanding domains, based on knowledge management models, and together with the available/acquired tacit and explicit knowledge, proposes a strategy for construction of conceptual models to represent the problem domain. This model will contain the main dimensions (perspectives), aspects and attributes that may be relevant to start a data science project. As a case study, it will be applied in the Type 2 Diabetes domain. Results show the effectiveness of the method. The conceptual model, obtained through the CAPTO method, can be used as an initial step for the conceptual selection of attributes.
APA, Harvard, Vancouver, ISO, and other styles
16

Igoche, Bern Igoche, Olumuyiwa Matthew, Peter Bednar, and Alexander Gegov. "Integrating Structural Causal Model Ontologies with LIME for Fair Machine Learning Explanations in Educational Admissions." Journal of Computing Theories and Applications 2, no. 1 (2024): 65–85. http://dx.doi.org/10.62411/jcta.10501.

Full text
Abstract:
This study employed knowledge discovery in databases (KDD) to extract and discover knowledge from the Benue State Polytechnic (Benpoly) admission database and used a structural causal model (SCM) ontological framework to represent the admission process in the Nigerian polytechnic education system. The SCM ontology identified important causal relations in features needed to model the admission process and was validated using the conditional independence test (CIT) criteria. The SCM ontology was further employed to identify and constrain input features causing bias in the local interpretable model-agnostic explanations (LIME) framework applied to machine learning (ML) black-box predictions. The ablation process produced more stable LIME explanations devoid of fairness bias compared to LIME without ablation, with higher prediction accuracy (91% vs. 89%) and F1 scores (95% vs. 94%). The study also compared the performance of different ML models, including Gaussian Naïve Bayes, Decision Trees, and Logistic Regression, before and after ablation. The limitation is that the SCM ontology is qualitative and context-specific, so the fair-LIME framework can only be extrapolated to similar contexts. Future work could compare other explanation frameworks like Shapley on the same dataset. Overall, this study demonstrates a novel approach to enforcing fairness in ML explanations by integrating qualitative SCM ontologies with quantitative ML/LIME methods.
APA, Harvard, Vancouver, ISO, and other styles
17

He, Han, Yuanyuan Hong, Weiwei Liu, and Sung-A. Kim. "Data mining model for multimedia financial time series using information entropy." Journal of Intelligent & Fuzzy Systems 39, no. 4 (2020): 5339–45. http://dx.doi.org/10.3233/jifs-189019.

Full text
Abstract:
At present, KDD research covers many aspects, and has achieved good results in the discovery of time series rules, association rules, classification rules and clustering rules. KDD has also been widely used in practical work such as OLAP and DW. Also, with the rapid development of network technology, KDD research based on WEB has been paid more and more attention. The main research content of this paper is to analyze and mine the time series data, obtain the inherent regularity, and use it in the application of financial time series transactions. In the financial field, there is a lot of data. Because of the huge amount of data, it is difficult for traditional processing methods to find the knowledge contained in it. New knowledge and new technology are urgently needed to solve this problem. The application of KDD technology in the financial field mainly focuses on customer relationship analysis and management, and the mining of transaction data is rare. The actual work requires a tool to analyze the transaction data and find its inherent regularity, to judge the nature and development trend of the transaction. Therefore, this paper studies the application of KDD in financial time series data mining, explores an appropriate pattern mining method, and designs an experimental system which includes mining trading patterns, analyzing the nature of transactions and predicting the development trend of transactions, to promote the application of KDD in the financial field.
APA, Harvard, Vancouver, ISO, and other styles
18

Shaaban, Amani Gomaa, Mohamed Helmy Khafagy, Mohamed Abbas Elmasry, Heba El-Beih, and Mohamed Hasan Ibrahim. "Knowledge discovery in manufacturing datasets using data mining techniques to improve business performance." Indonesian Journal of Electrical Engineering and Computer Science 26, no. 3 (2022): 1736. http://dx.doi.org/10.11591/ijeecs.v26.i3.pp1736-1746.

Full text
Abstract:
Recently due <span>to the explosion in the data field, there is a great interest in the data science areas such as big data, artificial intelligence, data mining, and machine learning. Knowledge gives control and power in numerous manufacturing areas. Companies, factories, and all organizations owners aim to benefit from their huge; recorded data that increases and expands very quickly to improve their business and improve the quality of their products. In this research paper, the knowledge discovery in databases (KDD) technique has been followed, “association rules” algorithms “Apriori algorithm”, and “chi-square automatic interaction detection (CHAID) analysis tree” have been applied on real datasets belonging to (Emisal factory). This factory annually loses tons of production due to the breakdowns that occur daily inside the factory, which leads to a loss of profit. After analyzing and understanding the factory product processes, we found some breakdowns occur a lot of days during the product lifecycle, these breakdowns affect badly on the production lifecycle which led to a decrease in sales. So, we have mined the data and used the mentioned methods above to build a predictive model that will predict the breakdown types and help the factory owner to manage the breakdowns risks by taking accurate actions before the breakdowns happen.</span>
APA, Harvard, Vancouver, ISO, and other styles
19

Shaaban, Amani Gomaa, Mohamed Helmy Khafagy, Mohamed Abbas Elmasry, Heba El-Beih, and Mohamed Hasan Ibrahim. "Knowledge discovery in manufacturing datasets using data mining techniques to improve business performance." Indonesian Journal of Electrical Engineering and Computer Science 26, no. 3 (2022): 1736–46. https://doi.org/10.11591/ijeecs.v26.i3.pp1736-1746.

Full text
Abstract:
Recently due to the explosion in the data field, there is a great interest in the data science areas such as big data, artificial intelligence, data mining, and machine learning. Knowledge gives control and power in numerous manufacturing areas. Companies, factories, and all organizations owners aim to benefit from their huge; recorded data that increases and expands very quickly to improve their business and improve the quality of their products. In this research paper, the knowledge discovery in databases (KDD) technique has been followed, “association rules” algorithms “Apriori algorithm”, and “chi-square automatic interaction detection (CHAID) analysis tree” have been applied on real datasets belonging to (Emisal factory). This factory annually loses tons of production due to the breakdowns that occur daily inside the factory, which leads to a loss of profit. After analyzing and understanding the factory product processes, we found some breakdowns occur a lot of days during the product lifecycle, these breakdowns affect badly on the production lifecycle which led to a decrease in sales. So, we have mined the data and used the mentioned methods above to build a predictive model that will predict the breakdown types and help the factory owner to manage the breakdowns risks by taking accurate actions before the breakdowns happen.
APA, Harvard, Vancouver, ISO, and other styles
20

GLAVAN, ALINA, and VICTOR CROITORU. "INCREMENTAL LEARNING FOR EDGE NETWORK INTRUSION DETECTION." REVUE ROUMAINE DES SCIENCES TECHNIQUES — SÉRIE ÉLECTROTECHNIQUE ET ÉNERGÉTIQUE 68, no. 3 (2023): 301–6. http://dx.doi.org/10.59277/rrst-ee.2023.3.9.

Full text
Abstract:
The paper presents incremental learning as a solution for adapting intrusion detection systems to the dynamic edge network conditions. Extreme gradient boost trees are proposed and evaluated with the Network Security Laboratory - Knowledge Discovery in Databases (NSL-KDD) benchmark dataset. The accuracy of the XGBoost classifier model improves by 15% with 1% of the KDD-test+ data used for training. A mechanism based on unsupervised learning that triggers retraining of the XGBoost classifier is suggested. These results are relevant in the context of model retraining on resource scarce environments (relative to a cloud environment), such as the network edge or edge devices.
APA, Harvard, Vancouver, ISO, and other styles
21

Eniodunmo, Oluwapelumi, and Raid Al-Aqtash. "A Predictive Model to Predict a Cyberattack Using Self Normalizing Neural Networks." International Journal of Statistics and Probability 12, no. 6 (2023): 60. http://dx.doi.org/10.5539/ijsp.v12n6p60.

Full text
Abstract:
A cyberattack is an unauthorized access and a threat to information systems. Intelligent intrusion systems rely on advancements in technology to detect cyberattacks. In this article, the KDD CUP 99 dataset, from the Third International Knowledge Discovery and Data mining Tools Competition that was held in 1999, is considered, and a class of neural networks, known as Self-Normalizing Neural Networks, is utilized to build a predictive model to detect cyberattacks in the KDD CUP 99 dataset. The accuracy and the precision of the self-normalizing neural network is compared with that of the k-nearest neighbors and the support vector machines, in addition to other models in literature. The self-normalizing neural network appears to perform better than other models in predicting cyberattacks, while also being efficient in predicting a normal connection.
APA, Harvard, Vancouver, ISO, and other styles
22

Mirza, Ahmad Haidar. "Poverty Data Model as Decision Tools in Planning Policy Development." Scientific Journal of Informatics 5, no. 1 (2018): 39. http://dx.doi.org/10.15294/sji.v5i1.14022.

Full text
Abstract:
Poverty is the main problem in a country both in developing countries to the developed countries, both in structural poverty, cultural and natural. That is, poverty is no longer seen as a measure of the failure of the Government to protect and fulfill the fundamental rights of its citizens but as a challenge of the nation to realize a fair society, prosperous and dignified sovereign. Various efforts have been made in determining government policy measures in an effort to overcome poverty, one of them by conducting a survey to assess the poor. The results of the survey of the various activities of the organization obtained a variety of database versions poverty to areas or locations. The information generated from the poverty database only includes recapitulation of poor people to the area or location. One step is to process the data on poverty in a process of Knowledge Discovery in Databases (KDD) to form a data mining poverty. Data mining is a logical combination of knowledge of data, and statistical analysis developed in the knowledge business or a process that uses statistical techniques, mathematics, artificial intelligence, artificial and machine-learning to extract and identify useful information for the relevant knowledge from various large databases.
APA, Harvard, Vancouver, ISO, and other styles
23

Balhara, Shreyansh. "DEVELOPING A DATA MINING BASED EFFICACIOUS PREDICTION MODEL OF DIABETICS AND AILED AILMENTS." International Journal of Research in Medical Sciences and Technology 11, no. 01 (2022): 216–21. http://dx.doi.org/10.37648/ijrmst.v11i01.021.

Full text
Abstract:
This model of recovering useful data and models from the information is called KDD (Knowledge Discovery of Database), which includes specific steps like information finding, grouping and change review. AI analyses are called managed and independently. A supervised learning analysis uses insight to predict new or unseeable information, though unaided measures can draw impedances from informative clusters. Supervised learning is additionally described as arrangement. This review uses clustering methods to deliver a more precise area with the class. The clustering analyses have been applied to the Indian Diabetes Data-set of the PIMA of the National Institute of Diabetes, Stomach related and kidney diseases which contains information on diabetic women
APA, Harvard, Vancouver, ISO, and other styles
24

Zapar, Rizky, Denni Pratama, Kaslani Kaslani, Cep Lukman Rohmat, and Faturrohman Faturrohman. "PENERAPAN MODEL REGRESI LINIER UNTUK PREDIKSI HARGA SAHAM BANK BCA PADA BURSA EFEK INDONESIA." JATI (Jurnal Mahasiswa Teknik Informatika) 8, no. 1 (2024): 196–202. http://dx.doi.org/10.36040/jati.v8i1.8215.

Full text
Abstract:
Dalam era globalisasi dan kompleksitas pasar modal, analisis prediksi harga saham menjadi elemen krusial bagi keberhasilan investor dan perusahaan. Fluktuasi harga saham yang tidak menentu menciptakan tantangan dalam meramalkan pergerakan pasar. Metodologi KDD (Knowledge Discovery in Databases) digunakan untuk mengekstraksi pengetahuan berharga dari data historis harga saham Bank BCA. Dengan fokus pada metode regresi linier, penelitian ini bertujuan untuk meningkatkan akurasi prediksi harga saham dan memvalidasi model menggunakan K-Fold Cross-Validation. Hasil evaluasi menunjukkan nilai RMSE sebesar 0.032, menandakan tingkat kesalahan yang rendah dan konsistensi kinerja model. Absolute Error sebesar 0.024 dengan rentang 0.007 mengindikasikan kemampuan model memberikan perkiraan yang akurat terhadap pergerakan harga saham. Root Relative Squared Error sebesar 0.138 dengan rentang 0.036 mencerminkan tingkat kesalahan relatif terhadap variasi data yang dapat dipertahankan oleh model. Dengan Squared Error sebesar 0.001 dan rentang 0.001, sebaran kesalahan prediksi model menunjukkan bahwa sebagian besar prediksi model berada dalam kisaran nilai sebenarnya. Model regresi linier dengan pendekatan KDD mampu memberikan prediksi harga saham Bank BCA dengan tingkat akurasi yang tinggi dan konsistensi yang baik, memberikan landasan yang kuat untuk pengambilan keputusan investasi di masa depan.
APA, Harvard, Vancouver, ISO, and other styles
25

Xu, Beijie, and Mimi Recker. "Understanding Teacher Users of a Digital Library Service: A Clustering Approach." Journal of Educational Data Mining 3, no. 1 (2011): 1–28. https://doi.org/10.5281/zenodo.3554702.

Full text
Abstract:
This article describes the Knowledge Discovery and Data Mining (KDD) process and its application in the field of educational data mining (EDM) in the context of a digital library service called the Instructional Architect (IA.usu.edu). In particular, the study reported in this article investigated a certain type of data mining problem, clustering, and used a statistical model, latent class analysis, to group the IA teacher users according to their diverse online behaviors. The use of LCA successfully helped us identify different types of users, ranging from window shoppers, lukewarm users to the most dedicated users, and distinguish the isolated users from the key brokers of this online community. The article concludes with a discussion of the implications of the discovered usage patterns on system design and on EDM in general.
APA, Harvard, Vancouver, ISO, and other styles
26

Vinne, Vinne, Dina Ulitia Sinurat, and Yunika Prasetianti. "PENERAPAN ALGORITMA K-NEAREST NEIGHBORS (KNN) DALAM MENGANALISIS SENTIMEN ULASAN APLIKASI SEABANK PADA GOOGLE PLAY STORE." Journal of Information Systems Management and Digital Business 2, no. 2 (2025): 103–13. https://doi.org/10.70248/jismdb.v2i2.1508.

Full text
Abstract:
Dengan menggunakan algoritma K-Nearest Neighbors (KNN), penelitian ini menganalisis tanggapan pengguna terhadap aplikasi SeaBank yang tersedia di Google Play Store. Seribu data ulasan dikumpulkan dengan teknik scraping dan diklasifikasikan menjadi motivasi positif dan negatif yang didasarkan pada skor ulasan. Proses melibatkan tahapan Knowledge Discovery from Data (KDD), seperti preprocessing data teks, pembobotan menggunakan metode TF-IDF, dan penerapan model KNN. Hasil menunjukkan bahwa model memiliki akurasi 62,5%, dengan kinerja yang lebih baik dalam mengklasifikasikan ulasan positif dibandingkan ulasan negatif. Penelitian ini memberikan wawasan berharga bagi pengembangan layanan SeaBank agar lebih responsif terhadap kebutuhan pengguna
APA, Harvard, Vancouver, ISO, and other styles
27

Wei, Jun Ying, Pei Si Zhong, and Chun Fen Guo. "Ontology Modeling of Manufacturing Resources." Key Engineering Materials 474-476 (April 2011): 1621–25. http://dx.doi.org/10.4028/www.scientific.net/kem.474-476.1621.

Full text
Abstract:
Aiming at the current situation of manufacturing enterprises’ resources, the definition and the classification of manufacturing resources are given out. Combining with the dominance of ontology in knowledge reuse and sharing, the ontology classification hierarchy tree of the standardized part resources is defined. By analyzing the properties information of rolling bearing, its ontology model is established based on OWL. The OWL model can reflect better the essential relationship and the semantic relations between concepts, providing a good informationization foundation for manufacturing resources discovery, reuse and sharing.
APA, Harvard, Vancouver, ISO, and other styles
28

Sofyan, Fazrin Meila Azzahra, Affani Putri Riyandoro, Devi Fitriani Maulana, and Jajam Haerul Jaman. "Penerapan Data Mining dengan Algoritma C5.0 Untuk Prediksi Penyakit Stroke." J-SISKO TECH (Jurnal Teknologi Sistem Informasi dan Sistem Komputer TGD) 6, no. 2 (2023): 619. http://dx.doi.org/10.53513/jsk.v6i2.8578.

Full text
Abstract:
Penyakit stroke merupakan kondisi yang mempengaruhi sistem saraf dan dapat menyebabkan dampak yang serius pada kesehatan seseorang. WHO menyatakan sebanyak 13,7 juta kasus setiap tahunnya dan 5,5 juta orang diantaranya meninggal dunia akibat penyakit ini. Tujuan dari penelitian ini adalah untuk mengembangkan model prediksi yang dapat membantu dalam identifikasi dini risiko terjadinya stroke. Metode yang digunakan dalam penelitian ini adalah Knowledge Discovery in Databases (KDD) dengan menerapkan algoritma C5.0, yang merupakan salah satu algoritma klasifikasi yang efektif dalam mengolah data dengan atribut numerik maupun kategorikal. Pada metode Knowledge Discovery in Databases (KDD) terdiri dari beberapa tahap yang perlu dilakukan untuk penelitian ini, yaitu selection, preprocessing, transformation, data mining, dan evaluation. Untuk Algoritma C5.0 sendiri merupakan sebuah algoritma klasifikasi dalam bidang data mining yang secara khusus digunakan dalam teknik decision tree. Data yang digunakan dalam penelitian ini adalah dataset yang berisi informasi medis dan faktor risiko yang terkait dengan stroke. Hasil dari penelitian ini berupa Decision Tree (pohon keputusan) dengan nilai accuracy, recall, dan precision dengan melakukan split data 80% (data training) - 20% (data testing) hasil nilai Accuracy yang diperoleh sebesar 95%, Recall = 96%, dan Precision = 99%.
APA, Harvard, Vancouver, ISO, and other styles
29

V. Rafael, Felicisima. "Predicting Job Change among Data Scientists using Machine Learning Technique." 14th GCBSS Proceeding 2022 14, no. 2 (2022): 1. http://dx.doi.org/10.35609/gcbssproceeding.2022.2(77).

Full text
Abstract:
In the knowledge and data-driven economy, countless ramifications were attributed to great contribution of data scientists in transforming business and industries by using various data science tools in recognizing and generating patterns in data points to generate insights. The study aimed at applying data science in human resources, and generates actionable intelligence, and HR analytics to better understand employees' perception towards the company, work environment. The researcher used the processes of Knowledge Discovery in Databases (KDD) method. Knowledge discovery in databases is the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns or relationships within a dataset (10,000 examples, 0 special attributes, and 14 regular attributes) to make important decisions. RapidMiner was used perform the KDD processes of selecting, pre-processing, data transformation, data mining using machine learning algorithm. Accordingly, Decision Tree was found to be the learning algorithm fit for the ExampleSet. Further, among 14 attributes, the most important attribute to split on was the city_development_index. This implies that the best predictor variable for job change among data scientists was the city_development_index. Consequently, the prediction model has 92.1% confidence that a Male who works in a city with a development index of 0.920, with relevant data science experience, not presently enrolled in the university, high school graduate, with 5 years of work experience, presently working in a Funded Start-up company with 50-99 employees, works for the first time with training hours=24 was predicted will "Not Change" a job. The model has 77.78% accuracy, and 81.70% precision. Keywords: Data Scientist, Data Science, Job Change, Human Resource Analytics
APA, Harvard, Vancouver, ISO, and other styles
30

Olanrewaju, Oyenike Mary, Faith Oluwatosin Echobu, and Abubakar Mogaji. "MODELLING OF AN INTRUSION DETECTION SYSTEM USING C4.5 MACHINE LEARNING ALGORITHM." FUDMA JOURNAL OF SCIENCES 4, no. 4 (2021): 454–59. http://dx.doi.org/10.33003/fjs-2020-0404-502.

Full text
Abstract:
The increasing growth of wireless networking and new mobile computing devices has caused boundaries between trusted and malicious users to be blurred. The shift in security priorities from the network perimeter to information protection and user resources security is an open area for research which is concerned with the protection of user information’s confidentiality, integrity and availability. Intrusion detection systems are programs or software applications embedded in sophisticated devices to monitor the activities on networks or systems for security, policy or protocol violation or malicious activities detection. In this work, an intrusion detection model was proposed using C4.5 algorithm which was implemented with WEKA tool and RAPID MINER. The model showed good performance when trained and tested with validation techniques. Implementation of the proposed model was conducted on the Network Security Laboratory Knowledge Discovery in Databases (NSL-KDD) dataset, an improved version of KDD 99 dataset, which showed that the proposed model approach has an average detection rate of 99.62% and reduced false alarm rate of 0.38%.
APA, Harvard, Vancouver, ISO, and other styles
31

Abdul Razak, Rohaila, Mazni Omar, and Mazida Ahmad. "A Student Performance Prediction Model Using Data Mining Technique." International Journal of Engineering & Technology 7, no. 2.15 (2018): 61. http://dx.doi.org/10.14419/ijet.v7i2.15.11214.

Full text
Abstract:
Predicting performance is very significant in the education world nowadays. This paper will describe the process of doing a prediction of student performance by using data mining technique. 257 data sets were taken from the student of semester 6 KPTM that involved four (4) academic programs which are Diploma in Computer System and Networking, Diploma in Information Technology, Diploma in Business Management and Diploma in Accountancy. Knowledge Discovery in Database (KDD) was used as a guide to the process of finding and extracting a knowledge from the dataset. A decision tree and linear regression were used to analyze the dataset based on variables selected. The variables used are Gender, Financing, SPM, GPASem1, GPASem2, GPASem3, GPASem4, GPASem5 and CGPA as a dependent variable. The result from this indicate the significant variable that contribute most to the students’ performance. Based on the analysis, the decision tree shows that GPASem1 has a strong significant to the CGPA final semester of the student and the prediction accuracy is 82%. The linear regression shows that the GPA for each semester has a highly significant with the dependent variable with 96.2% prediction accuracy. By having this information, the management of KPTM can make a plan to ensure that the student can maintain a good result and at the same time to make a strategic plans for those without a good result.
APA, Harvard, Vancouver, ISO, and other styles
32

Suryono, Michael Saputra, and Raymond Oetama. "Peramalan terhadap Forex dengan Metode ARIMA Studi Kasus GBP/USD." Ultimatics : Jurnal Teknik Informatika 11, no. 1 (2019): 6–10. http://dx.doi.org/10.31937/ti.v11i1.1238.

Full text
Abstract:
Forex or Foreign Exchange is trading a country's currency with another country's currency. The purpose of this study is basically to test the accuracy of ARIMA on the GBP/USD currency pair. In addition, this research is expected to provide the benefits of knowledge about forecasting using ARIMA. This study resulted in forecasting the GBP/USD currency pair within 1 month, per 6 months from January 2018 to June 2018 using the ARIMA method and R software. Data to be used are data taken from January 2013 to June 2018. For the the process will follow the process of the KDD (Knowledge Discovery in Database). The results obtained by the ARIMA model (3,2,1) as the best model to be applied for 1 month per 6 months on the GBP/USD currency pair because it has the lowest AIC value and the mean absolute percentage error is 3.16%.
APA, Harvard, Vancouver, ISO, and other styles
33

Primajaya, Aji, Betha Nurina Sari, and Ahmad Khusaeri. "Prediksi Potensi Kebakaran Hutan dengan Algoritma Klasifikasi C4.5 Studi Kasus Provinsi Kalimantan Barat." Jurnal Edukasi dan Penelitian Informatika (JEPIN) 6, no. 2 (2020): 188. http://dx.doi.org/10.26418/jp.v6i2.37834.

Full text
Abstract:
Algoritma C4.5 merupakan algoritma klasifikasi yang memungkinkan bisa diterapkan untuk studi kasus prediksi potensi kebakaran hutan. Untuk mengetahui penerapan algoritma C4.5 pada prediksi kebakaran hutan, perlu dilakukan penelitian terkait hal tersebut. Metodologi yang digunakan adalah Knowledge Discovery in Database (KDD). Tahap dari KDD terdiri dari pengumpulan dan pemilihan data, pemrosesan data, transformasi data, pengolahan data dengan algoritma C4.5 dan terakhir adalah interpretasi serta evaluasi pengetahuan. Percentage split, Cross validation, Use Training Set digunakan sebagai teknik pembagian data training dan testing dengan skenario pesentase dan dipilih model terbaik. Indikator evaluasi yang digunakan adalah akurasi. Penelitian menghasilkan kesimpulan bahwa C4.5 dengan percentage split 80%data training dan 20% data testing menghasilkan akurasi tertinggi yaitu 89,7859%.
APA, Harvard, Vancouver, ISO, and other styles
34

Zidane, M. Yazid, Betha Nurina Sari, Iqbal Maulana, Aji Primaya, and Garno Garno. "PENERAPAN DATA MINING DALAM KLASIFIKASI DATA TRANSAKSI PRODUK KOPERASI DI SMK PGRI 2 KARAWANG." JATI (Jurnal Mahasiswa Teknik Informatika) 9, no. 1 (2024): 263–69. https://doi.org/10.36040/jati.v9i1.12196.

Full text
Abstract:
Koperasi SMK PGRI 2 Karawang menghasilkan data transaksi harian yang sering kali hanya terdokumentasi tanpa pemanfaatan lebih lanjut, meskipun data tersebut dapat digunakan untuk mengoptimalkan penjualan dan mengurangi kerugian. Penelitian ini bertujuan menerapkan algoritma Naive Bayes untuk mengklasifikasikan data transaksi koperasi dalam memprediksi keuntungan dan kerugian. Data terdiri dari 774 transaksi dari Januari 2023 hingga Agustus 2024, mencakup atribut seperti tanggal, nama produk, kategori, jumlah terjual, harga, dan total penjualan. Metodologi penelitian menggunakan Knowledge Discovery in Database (KDD), yang meliputi pemilihan data, preprocessing, transformasi, data mining, dan evaluasi model. Hasil evaluasi menunjukkan akurasi model Naive Bayes sebesar 98,97% pada test size 0,5 dengan F1-score 0,99 dalam mengklasifikasikan produk laku dan kurang laku, sehingga meningkatkan efisiensi pengelolaan transaksi koperasi.
APA, Harvard, Vancouver, ISO, and other styles
35

Pérez, Francisco Maciá, Jose Vicente Berna Martienz, Alberto Fernández Oliva, and Miguel Abreu Ortega. "Application of the Variable Precision Rough Sets Model to Estimate the Outlier Probability of Each Element." Complexity 2018 (October 8, 2018): 1–14. http://dx.doi.org/10.1155/2018/4867607.

Full text
Abstract:
In a data mining process, outlier detection aims to use the high marginality of these elements to identify them by measuring their degree of deviation from representative patterns, thereby yielding relevant knowledge. Whereas rough sets (RS) theory has been applied to the field of knowledge discovery in databases (KDD) since its formulation in the 1980s; in recent years, outlier detection has been increasingly regarded as a KDD process with its own usefulness. The application of RS theory as a basis to characterise and detect outliers is a novel approach with great theoretical relevance and practical applicability. However, algorithms whose spatial and temporal complexity allows their application to realistic scenarios involving vast amounts of data and requiring very fast responses are difficult to develop. This study presents a theoretical framework based on a generalisation of RS theory, termed the variable precision rough sets model (VPRS), which allows the establishment of a stochastic approach to solving the problem of assessing whether a given element is an outlier within a specific universe of data. An algorithm derived from quasi-linearisation is developed based on this theoretical framework, thus enabling its application to large volumes of data. The experiments conducted demonstrate the feasibility of the proposed algorithm, whose usefulness is contextualised by comparison to different algorithms analysed in the literature.
APA, Harvard, Vancouver, ISO, and other styles
36

Muzakir, Ari, and Rika Anisa Wulandari. "Model Data Mining sebagai Prediksi Penyakit Hipertensi Kehamilan dengan Teknik Decision Tree." Scientific Journal of Informatics 3, no. 1 (2016): 19–26. http://dx.doi.org/10.15294/sji.v3i1.4610.

Full text
Abstract:
Prevalensi hipertensi pada wanita hamil terjadi sebanyak 1.062 kasus (12,7%). Dari 1062 kasus ibu hamil dengan hipertensi, ditemukan 125 kasus (11,8%) yang telah didiagnosis dengan hipertensi oleh tenaga kesehatan. RSIA YK Madira Palembang sebagai pusat kesehatan harus mengembangkan metode yang dapat memprediksi risiko tinggi ibu hamil dengan hipertensi dari data hasil pemeriksaan kehamilan. Dengan memanfaatkan sumber data yang terdiri dari data perawatan antenatal, diterapkan teknik data mining dengan algoritma decision tree C4.5, berdasarkan Knowledge Discovery in Database (KDD). Sehingga akan ditemukan pengetahuan, informasi, dan pola tersembunyi dari data pelayanan antenatal, yang merupakan prediksi hipertensi pada kehamilan. Metode yang digunakan yaitu Algoritma C4.5. Setelah mendapatkan decision tree dan rules yang dapat memprediksi penyakit hipertensi dalam kehamilan, dilakukan evaluasi dengan supplied test set menggunakan WEKA dihasilkan kesalahan (error) 7.3427% dan tingkat akurasi 92.6573%. Data training yang berjumlah 286 instances, hal ini menunjukkan bahwa terdapat 265 instances yang akurat dan 21 instances yang error atau prediksinya salah.
APA, Harvard, Vancouver, ISO, and other styles
37

Firdaniza, Firdaniza, Budi Nurani Ruchjana, Diah Chaerani, and Jaziar Radiant. "Information diffusion model with homogeneous continuous time Markov chain on Indonesian Twitter users." International Journal of Data and Network Science 6, no. 3 (2022): 659–68. http://dx.doi.org/10.5267/j.ijdns.2022.4.006.

Full text
Abstract:
In this paper, a homogeneous continuous time Markov chain (CTMC) is used to model information diffusion or dissemination, also to determine influencers on Twitter dynamically. The tweeting process can be modeled with a homogeneous CTMC since the properties of Markov chains are fulfilled. In this case, the tweets that are received by followers only depend on the tweets from the previous followers. Knowledge Discovery in Database (KDD) in Data Mining is used to be research methodology including pre-processing, data mining process using homogeneous CTMC, and post-processing to get the influencers using visualization that predicts the number of affected users. We assume the number of affected users follows a logarithmic function. Our study examines the Indonesian Twitter data users with tweets about covid19 vaccination resulted in dynamic influencer rankings over time. From these results, it can also be seen that the users with the highest number of followers are not necessarily the top influencer.
APA, Harvard, Vancouver, ISO, and other styles
38

Alif Prayudha, Bimo, Rudi Kurniawan, Yudhistira Wijaya, and Umi Hayati. "ALGORITMA K-MEANS UNTUK MENINGKATKAN MODEL KLASTERISASI DATA SISWA SMK SAMUDRA NUSANTARA KABUPATEN CIREBON BERDASARKAN NILAI AKADEMIK." JATI (Jurnal Mahasiswa Teknik Informatika) 9, no. 1 (2025): 1314–21. https://doi.org/10.36040/jati.v9i1.12689.

Full text
Abstract:
Pengelolaan data nilai akademik siswa yang besar dan kompleks menjadi tantangan signifikan dalam dunia pendidikan, khususnya di SMK yang fokus mempersiapkan siswa untuk dunia kerja. Data yang tidak terorganisir dengan baik sering kali menghambat proses pengambilan keputusan berbasis data. Algoritma K-Means dipilih dalam penelitian ini karena kemampuannya yang efektif dalam menganalisis data dan mengelompokkan pola tersembunyi. Penelitian ini menerapkan metodologi Knowledge Discovery in Databases (KDD), meliputi seleksi data, praproses, transformasi, klasterisasi, evaluasi menggunakan Davies-Bouldin Index (DBI), dan interpretasi hasil. Dataset terdiri dari nilai akademik siswa pada mata pelajaran utama, seperti Matematika, Bahasa Inggris, dan Kimia. Hasil penelitian menunjukkan klaster ideal terdiri dari dua kelompok dengan nilai DBI 0.519, di mana atribut Kimia memiliki pengaruh paling signifikan. Klasterisasi ini memberikan wawasan yang mendalam tentang pola akademik siswa, mendukung strategi pembelajaran berbasis data, serta membantu sekolah menyusun kebijakan pendidikan yang lebih efektif.
APA, Harvard, Vancouver, ISO, and other styles
39

Abdul mukhsyi, Sopian, Ade Irma Purnamaari, Agus Bahtiar, and Kaslani. "Improving Student Achievement Clustering Model Using K-Means Algorithm in Pasundan Majalaya Vocational School." Journal of Artificial Intelligence and Engineering Applications (JAIEA) 4, no. 2 (2025): 977–85. https://doi.org/10.59934/jaiea.v4i2.793.

Full text
Abstract:
This study analyzes and enhances the student achievement clustering model at SMK Pasundan Majalaya using the K-Means algorithm. The Knowledge Discovery in Databases (KDD) method and RapidMiner AI Studio 2024.1.0 were used to process data from 125 students based on 15 metrics, including academic scores and attendance rates. For group evaluation, the Elbow method and Davies-Bouldin Index (DBI) were employed. The results showed optimal clustering with 2 groups and a DBI value of 0.893. Analysis results revealed significant differences in characteristics between the two groups. Cluster_1 consists of 38 students and has lower score patterns (60-80), with attendance rates of 94-100%, and a positive correlation between attendance and academic achievement. On the other hand, Cluster_0 consists of 86 students and shows higher score patterns (67.5-87.5), with attendance rates of 80-100%, and demonstrates a positive correlation between attendance and academic achievement. Schools can use this clustering model to create learning approaches that are better suited to each student group.
APA, Harvard, Vancouver, ISO, and other styles
40

Arisa, Arisa, Rudi Kurniawan, and Umi Hayati. "ALGORITMA RANDOM FORETS UNTUK PENINGKATAN MODEL KLASIFIKASI PADA DATA DIAGNOSA PASIEN PUSKESMAS PEKALANGAN KOTA CIREBON." JATI (Jurnal Mahasiswa Teknik Informatika) 9, no. 1 (2025): 1224–31. https://doi.org/10.36040/jati.v9i1.12676.

Full text
Abstract:
Puskesmas Pekalangan Kota Cirebon menghadapi tantangan dalam memanfaatkan data diagnosis pasien secara efektif untuk mendukung keputusan medis. Pengelolaan data yang tidak seimbang dan kompleks sering kali menghambat akurasi klasifikasi diagnosis. Penelitian ini bertujuan untuk meningkatkan akurasi model klasifikasi diagnosis pasien menggunakan algoritma Random Forest dengan optimasi parameter "number of trees" dan "max depth". Metode penelitian ini mengadopsi pendekatan Knowledge Discovery in Databases (KDD) yang mencakup prapemrosesan data, seleksi fitur, dan evaluasi model. Dataset terdiri dari 3.769 data rekam medis pasien Puskesmas Pekalangan periode Januari hingga Juni 2024. Hasil penelitian menunjukkan bahwa parameter optimal "number of trees" adalah 28 dan "max depth" adalah 10, keduanya menghasilkan akurasi model sebesar 76,39%. Selain itu, atribut "keluhan utama" terbukti menjadi faktor yang paling berpengaruh terhadap prediksi, dengan akurasi mencapai 53,58%. Temuan ini menegaskan pentingnya pemilihan parameter yang tepat dan seleksi fitur dalam meningkatkan efisiensi serta keandalan model klasifikasi. Implementasi model ini diharapkan mampu mendukung pengambilan keputusan medis yang lebih akurat dan cepat di tingkat puskesmas.
APA, Harvard, Vancouver, ISO, and other styles
41

Nurdy, Awang Herjunie, Abdul Rahim, and Arbansyah. "Analisis Sentimen Ulasan Game Stumble Guys Pada Playstore Menggunakan Algoritma Naïve Bayes." Teknika 13, no. 3 (2024): 388–95. http://dx.doi.org/10.34148/teknika.v13i3.993.

Full text
Abstract:
Perkembangan teknologi yang pesat mempermudah akses ke berbagai hiburan digital, termasuk game online seperti Stumble Guys, yang telah diunduh lebih dari 163 juta kali dan mendapatkan ulasan beragam di Google Play Store. Penelitian ini bertujuan untuk menganalisis sentimen ulasan pengguna Stumble Guys menggunakan algoritma Naïve Bayes. Metode penelitian melibatkan tahapan Knowledge Discovery in Databases (KDD), meliputi pemilihan data, preprocessing, transformasi dengan CountVectorizer dan TF-IDF, serta pengklasifikasian dengan Naïve Bayes. Dengan menggunakan 1.500 ulasan dari Google Play Store, model Naïve Bayes mencapai akurasi 86%, dengan precision, recall, dan f1 score masing-masing sebesar 86%. Hasil penelitian menunjukkan bahwa Naïve Bayes efektif dalam mengklasifikasikan sentimen ulasan game Stumble Guys.
APA, Harvard, Vancouver, ISO, and other styles
42

Burhansyah, Farhan Nugroho, and Yan Sofyan Andhana Saputra. "ANALISIS SENTIMEN KOMENTAR INSTAGRAM TERHADAP WACANA KEBIJAKAN ELECTRONIC ROAD PRICING (ERP) DI JAKARTA MENGGUNAKAN ALGORITMA NAÏVE BAYES." Jurnal Sains & Teknologi Fakultas Teknik Universitas Darma Persada 14, no. 1 (2024): 99–107. http://dx.doi.org/10.70746/jstunsada.v14i1.508.

Full text
Abstract:
Pemerintah DKI memiliki rencana mengendalikan lalu lintas dengan memanfaatkan Jalan Berbayar Elektronik atau Electronic Road Pricing. Untuk mengetahui respon pengguna media sosial terhadap kebijakan tersebut, dilakukan analisis sentimen pada komentar aplikasi Instagram. Algoritma yang digunakan pada analisis sentimen ini adalah algoritma Naïve Bayes. Rancangan penelitian berdasarkan metodologi Knowledge Discovery in Databases, disingkat KDD. Penelitian ini menggunakan web scraping Apify untuk mengambil data postingan selama tahun 2023 sebanyak 1486 komentar aplikasi Instagram, dan python sebagai bahasa pemrogramannya. Penelitian ini menghasilkan kesimpulan kedalam tiga kelas sentimen, dengan jumlah masing-masing 97 label positif, 568 label negatif, dan 743 label netral. Untuk mengukur tingkat akurasi algoritma, pelaksanaan pengujian menggunakan metode split data 4 model, berturut-turut adalah model 90:10, 80:20, 70:30, dan 60:40. Pengujian akurasi dengan confusion matrix menunjukkan nilai akurasi tertinggi terdapat pada model 90:10, sedangkan pengujian akurasi dengan grafik ROC menunjukkan nilai AUC tertinggi pada model 90:20 dengan nilai 0.7335, sehingga bisa disimpulkan model berkualitas baik dalam membedakan kelas.
APA, Harvard, Vancouver, ISO, and other styles
43

Beryl Enrico Ritonga, Samuel, and Ultach Enri. "PERBANDINGAN ALGORITME C4.5 DAN NAÏVE BAYES DALAM KLASIFIKASI SEVERE PREEKLAMPSIA MENGGUNAKAN HEMATOLOGI." JATI (Jurnal Mahasiswa Teknik Informatika) 8, no. 2 (2024): 2200–2207. http://dx.doi.org/10.36040/jati.v8i2.9435.

Full text
Abstract:
Penelitian ini memfokuskan perbandingan metode klasifikasi Naïve Bayes dan C4.5 dalam mengidentifikasi kasus preeklampsia berdasarkan data hematologi pada dataset Rumah Sakit Bayukarta Karawang 2021-2023. Decision Tree (C4.5) menghasilkan akurasi 75%, sedangkan Naive Bayes 71%, dengan F1-score kelas 1 masing-masing 0.83 dan 0.80. Model C4.5 di-deploy sebagai web sederhana menggunakan Flask dan TailwindCSS untuk prediksi data baru. Metodologi yang dipakai penelitian ini yaitu KDD (Knowledge Discovery in Database) yang memungkinkan membantu penambangan data yang terstruktur dan terarah dari awal hingga akhir, Saran pengembangan mencakup analisis fitur preeklampsia, peningkatan akurasi, eksplorasi metode pengolahan data, dan penggunaan dataset lebih besar. Melibatkan faktor risiko tambahan dan penerapan ensemble learning juga disarankan. Diharapkan kontribusi penelitian ini dapat meningkatkan pemahaman dan pencegahan preeklampsia serta meningkatkan ketepatan model pada tahap awal kehamilan.
APA, Harvard, Vancouver, ISO, and other styles
44

Anggraeni, Anggi, Rudi Kurniawan, Yudhistira Wijaya, and Umi Hayati. "ALGORITMA FP-GROWTH UNTUK MENINGKATKAN MODEL ASOSIASI PADA DATA PERMINTAAN BARANG LOGISTIK RUMAH SAKIT XXX JAKARTA." JATI (Jurnal Mahasiswa Teknik Informatika) 9, no. 1 (2025): 1306–13. https://doi.org/10.36040/jati.v9i1.12688.

Full text
Abstract:
Pengelolaan logistik rumah sakit membutuhkan analisis pola permintaan barang untuk meningkatkan efisiensi operasional. Penelitian ini menggunakan algoritma FP-Growth untuk mengidentifikasi pola asosiasi antar barang logistik berdasarkan data permintaan Rumah Sakit XXX Jakarta tahun 2023. Algoritma ini dipilih karena kemampuannya dalam menganalisis frequent itemsets secara efisien dan akurat. Data penelitian mencakup 1.433 entri transaksi permintaan barang, meliputi nama barang, jumlah, dan frekuensi permintaan. Proses penelitian mengikuti tahapan Knowledge Discovery in Database (KDD), meliputi praproses data, transformasi, analisis menggunakan FP-Growth, dan evaluasi hasil. Analisis menemukan bahwa Tissue Hand Towel memiliki nilai support tertinggi (0,645), diikuti oleh Tissue Toilet Roll (0,174). Pola asosiasi antara keduanya menunjukkan confidence 79,4%, mengindikasikan hubungan kuat dalam pemesanan bersama. Hasil penelitian menunjukkan bahwa penerapan algoritma FP-Growth mampu memberikan wawasan strategis untuk manajemen logistik, seperti pengelolaan stok yang lebih efisien, pengurangan pemborosan, dan ketersediaan barang yang tepat waktu. Disarankan pengembangan sistem otomatis berbasis algoritma ini untuk meningkatkan efisiensi dan akurasi pengelolaan logistik di sektor kesehatan.
APA, Harvard, Vancouver, ISO, and other styles
45

Nurohmah, Yunita, Rini Mayasari, and Betha Nurina Sari. "OPTIMALISASI PERFORMA K-MEANS CLUSTERING DENGAN PCA DALAM ANALISIS TINGKAT KEMISKINAN DI JAWA BARAT." JATI (Jurnal Mahasiswa Teknik Informatika) 7, no. 3 (2023): 1657–65. http://dx.doi.org/10.36040/jati.v7i3.6884.

Full text
Abstract:
Kemiskinan adalah permasalahan kompleks dan sulit diatasi, terutama di Provinsi Jawa Barat yang memiliki tingkat kemiskinan yang tinggi di Indonesia. Data kemiskinan yang akurat menjadi faktor utama dalam mendukung strategi penanggulangan, sementara tingkat kemiskinan suatu wilayah dapat diketahui lebih mudah melalui pengukuran dan penentuan indikator pendukungnya. Penelitian ini bertujuan untuk mengetahui klasterisasi tingkat kemiskinan di Provinsi Jawa Barat dari tahun 2019 hingga 2022 dengan menggunakan metodologi Knowledge Discovery in Data Base (KDD) . Metode yang diterapkan pada penelitian ini adalah pengelompokan data menggunakan dua algoritma yaitu Principal Component Analys (PCA) dan K-Means Clustering. dan menggunakan metode evaluasi silhouette coefficient untuk mengetahui nilai dari model. Hasil evaluasi dari 2 skenario model pada penelitian ini menyatakan bahwa membangun model dengan tahap reduksi dimensi variabel terlebih dahulu, lalu dilanjutkan dengan tahap pengelompokkan data menggunakan algoritma K-Means (skenario 2) yang menghasilkan performa terbaik diantara model lainnya dengan nilai evaluasi 0.74 yang termasuk dalam kriteria kuat.
APA, Harvard, Vancouver, ISO, and other styles
46

Mirna, Mirna, Martanto Martanto, Arif Rinaldi Dikananda, and Ahmad Rifa’i. "ALGORITMA K-MEANS UNTUK MENGELOMPOKAN BANTUAN SOSIAL DALAM PENENTUAN STRATEGI DISTRIBUSI DESA SIGONG." Jurnal Informatika Teknologi dan Sains (Jinteks) 7, no. 2 (2025): 482–91. https://doi.org/10.51401/jinteks.v7i2.5141.

Full text
Abstract:
Distribusi bantuan sosial di Desa Sigong, Kabupaten Cirebon sering terkendala ketidaktepatan sasaran. Penelitian ini bertujuan mengoptimalkan pengelompokan penerima bantuan sosial menggunakan algoritma K-Means dan metodologi Knowledge Discovery in Databases (KDD). Data sosial-ekonomi 500 keluarga dianalisis melalui seleksi data, praproses, transformasi, pengelompokan, dan evaluasi model dengan indeks Davies-Bouldin (DBI). Hasil menunjukkan dua klaster optimal dengan nilai DBI 0,614, yaitu klaster pertama (945 penerima) dengan kebutuhan rendah dan klaster kedua (139 penerima) dengan kebutuhan tinggi. Rekomendasi berbasis data dari klaster ini meningkatkan keakuratan distribusi bantuan yang lebih adil dan efisien. Hasil penelitian menunjukkan bahwa algoritma K-Means dapat menjadi alat yang efektif dalam mendukung pengambilan keputusan berbasis data untuk kebijakan sosial yang tepat sasaran.
APA, Harvard, Vancouver, ISO, and other styles
47

Wicaksono, Yanuar. "SEGMENTASI PELANGGAN BISNIS DENGAN MULTI KRITERIA MENGGUNAKAN K-MEANS." Indonesian Journal of Business Intelligence (IJUBI) 1, no. 2 (2019): 45. http://dx.doi.org/10.21927/ijubi.v1i2.872.

Full text
Abstract:
Customer knowledge is an important asset, in gathering, and managing from sharing customer knowledge into valuable capital for the company. This causes the company to continue to innovate in producing products and serving according to customer needs. To find out the needs of each customer, the company needs to make customer segmentation. Customer segmentation is defined as the division into different groups with similar characteristics to develop marketing strategies that are tailored to customer characteristics. The easiest, simplest, well-known and commonly used model of customer characteristics is the model of the recency, frequency, monetary (RFM) criteria. The RFM model still has weaknesses in low customer segmentation capacity and does not provide information on the continuity of customer transactions in understanding customer loyalty. The research method used is the Knowledge Discovery in Database (KDD) method. The data is transformed into another format that suits the needs of analysis and then the customer is segmented using clustering data mining techniques with the K-Means algorithm. From the experiments, the RFM model guesses loyal customers when reviews, frequency and monetary are high. In reality, the recency only provides information on the customer making the last transaction and the high number of transaction frequencies can be done without the customer's stability in making transactions each period. Implementing multi-criteria in customer segmentation can be better than just RFM criteria. So it will not be wrong to treat customers according to the groups that have been formed.
APA, Harvard, Vancouver, ISO, and other styles
48

Gina Regiana, Ade Irma Purnamasari, Agus Bahtiar, and Edi Tohidi. "K-Means Algorithm to Improve Leaf Image Clustering Model for Rice Disease Early Detection." Journal of Artificial Intelligence and Engineering Applications (JAIEA) 4, no. 2 (2025): 1156–60. https://doi.org/10.59934/jaiea.v4i2.840.

Full text
Abstract:
This research aims to improve the accuracy of rice leaf image clustering in early disease detection using the K-Means algorithm. The approach used involves the Knowledge Discovery in Databases (KDD) method, which includes data selection, pre-processing, data transformation, data mining, evaluation, and presentation of results. The dataset used consists of images of healthy leaves and leaves infected with diseases such as Bacterial Leaf Blight, Brown Spot, and Leaf Smut. The images are processed through grayscale conversion, noise removal, size adjustment, and data augmentation. The K-Means algorithm is applied to cluster image features based on visual similarity. Evaluation results using Silhouette Score showed that the best clustering was obtained at K=2 with a score of 0.8340, resulting in two main clusters separating healthy and infected images. This study concludes that the K-Means algorithm is able to improve the efficiency and accuracy of rice disease detection, so that it can assist farmers in taking early preventive measures and increase agricultural productivity. This implementation shows significant potential in the development of smart agriculture technology.
APA, Harvard, Vancouver, ISO, and other styles
49

Bretas, Wagner Viana, Alline Sardinha Cordeiro Morais, Henrique Rego Monteiro da Hora, and Edson Terra Azevedo Filho. "Knowledge extraction on international markets from patent bases: a study on green patents." Brazilian Journal of Operations & Production Management 16, no. 4 (2019): 698–705. http://dx.doi.org/10.14488/bjopm.2019.v16.n4.a14.

Full text
Abstract:
Goal: This article aims to propose a model for stratifying technological information from meta-data contained in international patent bases, capable of supporting the strategic decision making that potentiates actions directed to foreign trade.
 Design / Methodology / Approach: This applied research was based on the KDD - Knowledge Discovery in Databases methodology and carried out a study focused on green patents. Patent bibliographic data published in the Patent Cooperation Treaty (PCT) from 2003 to 2012, focusing on alternative energies, more precisely on biofuels, were obtained from the Derwent database, with the search string based on the Green Patents IPC Inventory, published by the World Intellectual Property Organization (WIPO). After treatment and sanitization, more than 36,000 resulting records were performed under C4.5 algorithm, denominated J-48 from the software Weka, resulting in Brazil as the destination country.
 Results: A decision tree was established, in which Mexico was highlighted as the main discretionary country. It was also verified the adhesion of the other emerging countries, which, along with Brazil, compose the BRICS.
 Limitations of the investigation: The proposed model is limited to areas that show intensive use of technology in products and processes.
 Practical implications: It could be inferred that the proposed method can help companies to identify international markets more sensitive to a certain technology, from a free database, reliable and capable of being used by micro and small companies.
 Originality / Value: In scientific communication, it is not easy to find Data mining applied to Patent database, and in this study, BRICS cluster were identified in Green patents WIPO deposit.
APA, Harvard, Vancouver, ISO, and other styles
50

Dinar Danureksa, Moch Maliq, Rudi Kurnawan, and Yudhist Ira Arie Wijaya. "PENERAPAN ALGORITMA K-MEANS UNTUK OPTIMASI MODEL CLUSTERING DATA SUPPLIER DI APLIKASI SHOPEE." JATI (Jurnal Mahasiswa Teknik Informatika) 9, no. 1 (2025): 1676–84. https://doi.org/10.36040/jati.v9i1.12723.

Full text
Abstract:
Clustering adalah teknik analisis data yang digunakan untuk mengelompokkan data berdasarkan kesamaan karakteristik. Dalam konteks e-commerce, khususnya pada platform Supplier Shopee, teknik ini dapat dimanfaatkan untuk meningkatkan efisiensi pengambilan keputusan strategis. Penelitian ini berfokus pada penerapan algoritma K-Means untuk membangun model clustering yang optimal dalam mengidentifikasi pola transaksi supplier di platform tersebut.Metode analisis yang digunakan adalah Knowledge Discovery in Database (KDD), yang terdiri dari lima tahapan utama: seleksi data, praproses data, transformasi, penerapan algoritma K-Means, dan evaluasi hasil. Eksperimen dilakukan dengan menguji beberapa nilai K untuk menemukan hasil cluster terbaik. Evaluasi cluster dilakukan menggunakan Davies-Bouldin Index (DBI), sebuah metrik yang mengukur tingkat pemisahan antar cluster dan keseragaman dalam cluster.Hasil penelitian menunjukkan bahwa nilai K terbaik adalah 2, dengan nilai DBI sebesar 0,213. Ini menunjukkan cluster yang dihasilkan memiliki pemisahan yang baik dan struktur internal yang konsisten. Cluster 0 merepresentasikan supplier dengan jumlah transaksi sedikit, cluster 1 mencakup supplier dengan transaksi tinggi, dan cluster 2 melibatkan supplier dengan transaksi sedang.Hasil ini membuktikan bahwa algoritma K-Means dapat secara efektif mengidentifikasi pola transaksi. Informasi ini berguna untuk merumuskan strategi pemasaran, seperti insentif bagi supplier bertransaksi rendah atau penguatan hubungan dengan supplier unggul. Penelitian ini menegaskan pentingnya algoritma K-Means dan KDD dalam analisis data besar untuk mendukung pengambilan keputusan berbasis data di e-commerce.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography