Relevant bibliographies by topics / Keyword: Data Mining

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'Keyword: Data Mining'

Author: Grafiati

Published: 6 June 2025

Last updated: 16 July 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Keyword: Data Mining.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Keyword: Data Mining"

Hong, Jae-Won, and Seung-Bae Park. "The Identification of Marketing Performance Using Text Mining of Airline Review Data." Mobile Information Systems 2019 (January 2, 2019): 1–8. http://dx.doi.org/10.1155/2019/1790429.

Full text

Abstract:

We are aim firstly to extract major keywords using text mining method, secondly to identify prominent keyword from the keywords extracted from text mining analysis, and then to confirm differences in influences of the keywords which affect corporate performance. Results were as following. First, keywords have been found to show distinctive features. Since the keywords posted from the clients showed certain tendency, airlines accordingly need service management by identifying the service property through keyword analysis. Second, prominent keywords have been found out of the keyword extracted from text mining. Some of the keywords have significantly correlated with marketing performance, but others not. This implies that the company could uncover consumers’ needs through the prominent keywords and managing the properties related to the prominent keywords would help with improving corporate performance. Third, “recommend” should be treated distinctively with “satisfaction” in terms of service management through the keywords. Results suggest strategic implications to the practical business environment by analyzing keywords around the industry using text mining. We believe this work, which aims to establish common ground for understanding these analyses across multiple disciplinary perspectives, will encourage further research and development of service industry.

APA, Harvard, Vancouver, ISO, and other styles

Baumgarten, M., M. D. Mulvenna, N. Rooney, and J. Reid. "Keyword-Based Sentiment Mining using Twitter." International Journal of Ambient Computing and Intelligence 5, no. 2 (2013): 56–69. http://dx.doi.org/10.4018/jaci.2013040104.

Full text

Abstract:

Big Data are the new frontier for businesses and governments alike. Dealing with big data and extracting valuable and actionable knowledge from it poses one of the biggest challenges in computing and, simultaneously, provides one of the greatest opportunities for business, government and society alike. The content produced by the social media community and in particular the micro blogging community reflects one of the most opinion- and knowledge-rich, real-time accessible, expressive and diverse data sources, both in terms of content itself as well as context related knowledge such as user profiles including user relations. Harnessing the embedded knowledge and in particular the underlying opinion about certain topics and gaining a deeper understanding of the overall context will provide new opportunities in the inclusion of user opinions and preferences. This paper discusses a keyword-based classifier for short message based sentiment mining. It outlines a simple classification mechanism that has the potential to be extended to include additional sentiment dimensions. Eventually, this could provide a deeper understanding about user preferences, which in turn could actively and in almost real time influence further development activities or marketing campaigns.

APA, Harvard, Vancouver, ISO, and other styles

Huang, Yue, Hu Liu, and Jing Pan. "Identification of data mining research frontier based on conference papers." International Journal of Crowd Science 5, no. 2 (2021): 143–53. http://dx.doi.org/10.1108/ijcs-01-2021-0001.

Full text

Abstract:

Purpose Identifying the frontiers of a specific research field is one of the most basic tasks in bibliometrics and research published in leading conferences is crucial to the data mining research community, whereas few research studies have focused on it. The purpose of this study is to detect the intellectual structure of data mining based on conference papers. Design/methodology/approach This study takes the authoritative conference papers of the ranking 9 in the data mining field provided by Google Scholar Metrics as a sample. According to paper amount, this paper first detects the annual situation of the published documents and the distribution of the published conferences. Furthermore, from the research perspective of keywords, CiteSpace was used to dig into the conference papers to identify the frontiers of data mining, which focus on keywords term frequency, keywords betweenness centrality, keywords clustering and burst keywords. Findings Research showed that the research heat of data mining had experienced a linear upward trend during 2007 and 2016. The frontier identification based on the conference papers showed that there were five research hotspots in data mining, including clustering, classification, recommendation, social network analysis and community detection. The research contents embodied in the conference papers were also very rich. Originality/value This study detected the research frontier from leading data mining conference papers. Based on the keyword co-occurrence network, from four dimensions of keyword term frequency, betweeness centrality, clustering analysis and burst analysis, this paper identified and analyzed the research frontiers of data mining discipline from 2007 to 2016.

APA, Harvard, Vancouver, ISO, and other styles

Singh, Ashishika, and S. Babu. "Travel Route Recommendation System using User Keyword Search." International Journal of Recent Technology and Engineering (IJRTE) 8, no. 6 (2020): 2052–56. http://dx.doi.org/10.35940/ijrte.f7275.038620.

Full text

Abstract:

Travel and tourism is a field, which have been growing substantially over the past few decades. The competitiveness in marketing and need of fulfilling customer experience in travel have given many opportunities for today’s technological advancements to play a crucial role in it. Those technology aspects are Big Data and Data Mining. Data Mining uses technologies of statistics, mathematics, machine learning and artificial intelligence. It aims to classify original, valid, useful, potentially and understand correlations and patterns. Data mining with the help of Big Data - Hadoop can help analyze and derive information, which can increase the growth of industry and give accurate suggestion to customer. The reason of combining capabilities of Hadoop is it can handle all sorts of data such as Structured or Unstructured. The main objective of this project also revolves around the same principle giving the best Customer Experience. By combining the power of Data Analytics of data mining, Big Data and programming capabilities of Java, this project focuses on building a customer centric Keyword Aware Travel Route Framework.”

APA, Harvard, Vancouver, ISO, and other styles

Ramasamy, S., and K. Nirmala. "Disease prediction in data mining using association rule mining and keyword based clustering algorithms." International Journal of Computers and Applications 42, no. 1 (2017): 1–8. http://dx.doi.org/10.1080/1206212x.2017.1396415.

Full text

APA, Harvard, Vancouver, ISO, and other styles

CHEN, ZHENGXIN. "DATABASE KEYWORD SEARCH: BEYOND ITS CURRENT LANDSCAPE." International Journal of Information Technology & Decision Making 11, no. 02 (2012): 491–500. http://dx.doi.org/10.1142/s0219622012400123.

Full text

Abstract:

As an active research field, database keyword search (KWS) has put much emphasis on the performance issues, due to its high computational cost. However, a closer examination on KWS reveals that there are some other interesting aspects worth noting. In this paper, we examine KWS from a broader perspective, analyzing its profound implications. Freed from syntax-related considerations, KWS users now have better opportunities to explore the data in the way as they wish, and such exploration may reveal useful hindsight for understanding the hidden nature of the data, thus benefiting the exploitation on the use of the data. In particular, we examine the potential of KWS to data mining from the behavior mining perspective. This examination also leads us to a discussion of viewing KWS in the "web of life" context. We further discuss connections of KWS with other related concepts, including dataspace. Based on our findings and critical examination, we also provide a brief overview of an integrated environment which facilitates the interplay of KWS and data mining.

APA, Harvard, Vancouver, ISO, and other styles

Hwang, Jae-Min, Seo-Bin Hong, and Cheol-Soo Kang. "NAVER News Data Text Mining Analysis : Focusing on the Keyword ‘Algorithm’." Journal of Innovation Industry Technology 2, no. 1 (2024): 1–7. http://dx.doi.org/10.60032/jiit.2024.2.1.1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yoo, Taesang. "ANALYSIS OF UZBEKISTAN'S RELATIONS WITH CHINA, RUSSIA, AND SOUTH KOREA: UTILIZING TEXT MINING BASED ON GDELT BIG DATA." Critique Open Research & Review 02, no. 04 (2024): 1–7. http://dx.doi.org/10.55640/corr-v02i04-01.

Full text

Abstract:

GDELT (Global Dataset on Events, Language, and Tone), a comprehensive event dataset developed in 2013, provides quantifiable data on cooperative and conflictual relations between countries, as well as events related to specific phenomena. Its utility has been widely recognized, and it is actively used in international relations and foreign policy research. This study also utilizes GDELT data to introduce a method for analyzing the relationships between Uzbekistan and China, Russia, and South Korea through keyword correlation indicators derived from text mining, and to present the actual analysis results. By conducting a keyword correlation analysis based on event data between Uzbekistan and the three analyzed countries—China, Russia, and South Korea—it was possible to identify diplomatic action patterns through the correlation of core keywords such as "engage," "express intent," and "make" with keywords indicating specific actions. Uzbekistan’s foreign policy under the Mirziyoyev government has shown a diplomatic action pattern of seeking or participating in negotiations, expressing intentions for negotiation or cooperation, and pursuing meetings for negotiation and cooperation with the analyzed countries, while inviting or visiting counterpart countries as part of these processes. This demonstrates that Uzbekistan has actively worked towards establishing cooperative relations, which were set as the goal or direction of its foreign relations, named the “Good Neighbor Policy” by President Mirziyoyev. Such keyword correlation indicator analysis facilitates the explanation of cooperative relationships.

APA, Harvard, Vancouver, ISO, and other styles

Mishra, Pawan, Preeti Srivastav, Priyanshu Sharma, and Mohd Atif. "AI-Based Medical Data Mining." International Journal of Innovative Research in Advanced Engineering 11, no. 11 (2024): 794–99. https://doi.org/10.26562/ijirae.2024.v1111.01.

Full text

Abstract:

The analysis of real-time medical data is a transformative approach that combines machine learning, natural language processing, and web scraping technologies to offer timely, accurate, and relevant information for healthcare decision-making. This paper presents a system that, when given a medical keyword as input, system retrieves the top Google links, assesses the content for accuracy, and organizes the data into a structured way. The system also summarizes the information into concise, digestible points for quick reference. This work provides healthcare professionals, researchers, and policymakers with a powerful tool for retrieving, analyzing, and synthesizing real-time online medical data.

APA, Harvard, Vancouver, ISO, and other styles

Jeong, Wuseong, JungJin Kim, and Hanseok Jeong. "Information Extraction from Unstructured Data on Microplastics through Text Mining." Journal of Korean Society of Environmental Engineers 45, no. 1 (2023): 34–42. http://dx.doi.org/10.4491/ksee.2023.45.1.34.

Full text

Abstract:

Objectives:In this study, we seek to provide a thorough insight into how people perceive microplastics and uncover issues and hidden trends about the significant microplastic pollution problems by analyzing unstructured data on microplastics.Methods:Environmental news articles related to microplastics were collected. Text mining techniques including data pre-processing, word cloud, TF-IDF weight-based trend analysis, and LDA topic modeling were used to analyze the amount of textual data.Results and Discussion:The public's interest in microplastics is consistently growing, according to an analysis of all environmental news and the keyword ‘microplastic’ from 2014 to 2021 conducted via BIGKinds. The keyword 'trash' was the overwhelmingly enormous weight among words. The top 5 keywords connected to microplastics did not fade away and continued appearing even though the socially noticeable keywords during the study period varied yearly. This indicates that the primary issue with microplastics related to keywords has not yet been solved. Our study has a limitation of subject diversity because we only focused on microplastic news. The results, however, presented all processes from plastic pollution emergence to treatment, such as microplastic pollution sources, microplastic detection, and prevention methods against microplastics.Conclusion:Text mining analysis was performed on microplastics in environmental news and provided issues and trends on microplastic pollution. This study presents a new methodology for environmental and social problem analysis, suggesting that it could enable a multidimensional understanding of environmental problems and help establish environmental policies.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Keyword: Data Mining"

Thambiratnam, Albert J. K. "Acoustic keyword spotting in speech with applications to data mining." Thesis, Queensland University of Technology, 2005. https://eprints.qut.edu.au/37254/1/Albert_Thambiratnam_Thesis.pdf.

Full text

Abstract:

Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.

APA, Harvard, Vancouver, ISO, and other styles

Li, Hanzhe. "Sentiment Analysis and Opinion Mining on Twitter with GMO Keyword." Thesis, North Dakota State University, 2016. http://hdl.handle.net/10365/25787.

Full text

Abstract:

Twitter are a new source of information for data mining techniques. Messages posted through Twitter provide a major information source to gauge public sentiment on topics ranging from politics to fashion trends. The purpose of this paper is to analyze the Twitter tweets to discern the opinions of users regarding Genetically Modified Organisms (GMOs). We examine the effectiveness of several classifiers, Multinomial Na?ve Bayes, Bernoulli Na?ve Bayes, Logistic Regression and Linear Support Vector Classifier (SVC) in identifying a positive, negative or neutral category on a tweet corpus. Additionally, we use three datasets in this experiment to examine which dataset has the best score. Comparing the classifiers, we discovered that GMO_NDSU has the highest score in each classifier of my experiment among three datasets, and Linear SVC had the highest consistent accuracy by using bigrams as feature extraction and Term Frequency, Chi Square as feature selection.

APA, Harvard, Vancouver, ISO, and other styles

Agarwal, Virat. "Algorithm design on multicore processors for massive-data analysis." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34839.

Full text

Abstract:

Analyzing massive-data sets and streams is computationally very challenging. Data sets in systems biology, network analysis and security use network abstraction to construct large-scale graphs. Graph algorithms such as traversal and search are memory-intensive and typically require very little computation, with access patterns that are irregular and fine-grained. The increasing streaming data rates in various domains such as security, mining, and finance leaves algorithm designers with only a handful of clock cycles (with current general purpose computing technology) to process every incoming byte of data in-core at real-time. This along with increasing complexity of mining patterns and other analytics puts further pressure on already high computational requirement. Processing streaming data in finance comes with an additional constraint to process at low latency, that restricts the algorithm to use common techniques such as batching to obtain high throughput. The primary contributions of this dissertation are the design of novel parallel data analysis algorithms for graph traversal on large-scale graphs, pattern recognition and keyword scanning on massive streaming data, financial market data feed processing and analytics, and data transformation, that capture the machine-independent aspects, to guarantee portability with performance to future processors, with high performance implementations on multicore processors that embed processorspecific optimizations. Our breadth first search graph traversal algorithm demonstrates a capability to process massive graphs with billions of vertices and edges on commodity multicore processors at rates that are competitive with supercomputing results in the recent literature. We also present high performance scalable keyword scanning on streaming data using novel automata compression algorithm, a model of computation based on small software content addressable memories (CAMs) and a unique data layout that forces data re-use and minimizes memory traffic. Using a high-level algorithmic approach to process financial feeds we present a solution that decodes and normalizes option market data at rates an order of magnitude more than the current needs of the market, yet portable and flexible to other feeds in this domain. In this dissertation we discuss in detail algorithm design challenges to process massive-data and present solutions and techniques that we believe can be used and extended to solve future research problems in this domain.

APA, Harvard, Vancouver, ISO, and other styles

Matička, Jiří. "Extrakce klíčových slov z dokumentů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236533.

Full text

Abstract:

This thesis pursues an automated extraction of keywords from documents. Its goal is to design and implement an application which will be able to extract an appropriate set of keywords related to the contents of the document. The major requirements for the application are speed and accuracy. That is why the first part of the thesis talks about already developed principles and a detailed classification based on various criteria. The second part is focused on choosing and a thorough functional describing of one of the methods which should have been used for extracting the keywords. The next parts contain a detailed draft of the application and its implementation. Finally, the last chapter is particularly important due to testing the application on a group of text documents and evaluating final results of the extraction process.

APA, Harvard, Vancouver, ISO, and other styles

Wallace, Roy Geoffrey. "Fast and accurate phonetic spoken term detection." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/39610/1/Roy_Wallace_Thesis.pdf.

Full text

Abstract:

For the first time in human history, large volumes of spoken audio are being broadcast, made available on the internet, archived, and monitored for surveillance every day. New technologies are urgently required to unlock these vast and powerful stores of information. Spoken Term Detection (STD) systems provide access to speech collections by detecting individual occurrences of specified search terms. The aim of this work is to develop improved STD solutions based on phonetic indexing. In particular, this work aims to develop phonetic STD systems for applications that require open-vocabulary search, fast indexing and search speeds, and accurate term detection. Within this scope, novel contributions are made within two research themes, that is, accommodating phone recognition errors and, secondly, modelling uncertainty with probabilistic scores. A state-of-the-art Dynamic Match Lattice Spotting (DMLS) system is used to address the problem of accommodating phone recognition errors with approximate phone sequence matching. Extensive experimentation on the use of DMLS is carried out and a number of novel enhancements are developed that provide for faster indexing, faster search, and improved accuracy. Firstly, a novel comparison of methods for deriving a phone error cost model is presented to improve STD accuracy, resulting in up to a 33% improvement in the Figure of Merit. A method is also presented for drastically increasing the speed of DMLS search by at least an order of magnitude with no loss in search accuracy. An investigation is then presented of the effects of increasing indexing speed for DMLS, by using simpler modelling during phone decoding, with results highlighting the trade-off between indexing speed, search speed and search accuracy. The Figure of Merit is further improved by up to 25% using a novel proposal to utilise word-level language modelling during DMLS indexing. Analysis shows that this use of language modelling can, however, be unhelpful or even disadvantageous for terms with a very low language model probability. The DMLS approach to STD involves generating an index of phone sequences using phone recognition. An alternative approach to phonetic STD is also investigated that instead indexes probabilistic acoustic scores in the form of a posterior-feature matrix. A state-of-the-art system is described and its use for STD is explored through several experiments on spontaneous conversational telephone speech. A novel technique and framework is proposed for discriminatively training such a system to directly maximise the Figure of Merit. This results in a 13% improvement in the Figure of Merit on held-out data. The framework is also found to be particularly useful for index compression in conjunction with the proposed optimisation technique, providing for a substantial index compression factor in addition to an overall gain in the Figure of Merit. These contributions significantly advance the state-of-the-art in phonetic STD, by improving the utility of such systems in a wide range of applications.

APA, Harvard, Vancouver, ISO, and other styles

Daglar, Toprak Seda. "A New Hybrid Multi-relational Data Mining Technique." Master's thesis, METU, 2005. http://etd.lib.metu.edu.tr/upload/12606150/index.pdf.

Full text

Abstract:

Multi-relational learning has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. As patterns involve multiple relations, the search space of possible hypotheses becomes intractably complex. Many relational knowledge discovery systems have been developed employing various search strategies, search heuristics and pattern language limitations in order to cope with the complexity of hypothesis space. In this work, we propose a relational concept learning technique, which adopts concept descriptions as associations between the concept and the preconditions to this concept and employs a relational upgrade of association rule mining search heuristic, APRIORI rule, to effectively prune the search space. The proposed system is a hybrid predictive inductive logic system, which utilizes inverse resolution for generalization of concept instances in the presence of background knowledge and refines these general patterns into frequent and strong concept definitions with a modified APRIORI-based specialization operator. Two versions of the system are tested for three real-world learning problems: learning a linearly recursive relation, predicting carcinogenicity of molecules within Predictive Toxicology Evaluation (PTE) challenge and mesh design. Results of the experiments show that the proposed hybrid method is competitive with state-of-the-art systems.

APA, Harvard, Vancouver, ISO, and other styles

Chen, Chen-yu, and 陳貞諭. "Patent keyword analysis and data mining." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/75388879567317736075.

Full text

Abstract:

碩士<br>逢甲大學<br>工業工程與系統管理學研究所<br>98<br>In the era of knowledge economy, patents are Specific characterization of technology innovation and intellectual property. Patents contain important research results, and the technical methods are protected by law. Patent analysis relies heavily on manual work, thus restricting its operational efficiency. In this study, we first use text mining（TM）to transform patent documents into structured data to identify keyword vectors. Second, we use principal component analysis（PCA）to reduce the numbers of keyword vectors. Third, varimax rotation is applied in order to find more clearly defined factors which can be more easily interpreted. PCA show the existence of up to 23 significant PCs which account for 70% of the variance. Varimax rotation show the existence of up to 25 significant PCs which account for 70% of the variance. PCA is no significant difference with Varimax rotation.Loadings of PCA compare with Varimax rotation, the result shows that Varimax rotation better than PCA. The company wants to know new patents whether to infringe upon the right, they can pick up patents by patent keyword. The experts must read 66 patents by PCA, but they only resd 25 patents by Varimax rotation. The result shows that Varimax rotation can reduced most 41 patents than PCA.

APA, Harvard, Vancouver, ISO, and other styles

Pradhan, Anima. "Helmholtz Principle-Based Keyword Extraction." Thesis, 2013. http://ethesis.nitrkl.ac.in/5048/1/211CS1048_(1).pdf.

Full text

Abstract:

In today’s world of evolving technology, everybody wishes to accomplish tasks in least time. As information available online is perpetuating every day, it becomes very difficult to summarize any more than 100 documents in acceptable time. Thus, ”text summarization” is a challenging problem in the area of Natural Language Processing (NLP) especially in the context of global languages. In this thesis, we survey taxonomy of text summarization from different aspects. It briefly explains different approaches to summarization and the evaluation parameters. Also presented are a thorough details and facts about more than fifty automatic text summarization systems to ease the job of researchers and serve as a short encyclopedia for the investigated systems. Keyword extraction methods plays vital role in text mining and document processing. Keywords represent essential content of a document. Text mining applications take the advantage of keywords for processing documents. A quality Keyword is a word that represents the exact content of the text subsetly. It is very difficult to process large number of documents to get high quality keywords in acceptable time. This thesis gives a comparison between the most popular keyword extractions method, tf-idf and the proposed method that is based on Helmholtz Principle. Helmholtz Principle is based on the ideas from image processing and derived from the Gestalt theory of human perception. We also investigate the run time to extract the keywords by both the methods. Experimental results show that keyword extraction method based on Helmholtz Principle outperformancetf-idf.

APA, Harvard, Vancouver, ISO, and other styles

Thomas, Justine Raju. "Keyword Detection in Text Summarization." Thesis, 2015. http://ethesis.nitrkl.ac.in/7964/1/2015_Keyword_Thomas.pdf.

Full text

Abstract:

Summarization is the process of reducing a text document in order to create a summary that retains the most important points of the original document. As the problem of information overload has grown, and as the quantity of data has increased, so has interest in automatic summarization. Extractive summary works on the given text to extract sentences that best convey the message hidden in the text. Most extractive summarization techniques revolve around the concept of indexing keywords and extracting sentences that have more keywords than the rest. Keyword extraction usually is done by extracting important words having a higher frequency than others, with stress on important. However the current techniques to handle this importance include a stop list which might include words that are critically important to the text. In this thesis, I present a work in progress to define an algorithm to extract truly significant keywords which might have lost its significance if subjected to the current keyword extraction algorithms.

APA, Harvard, Vancouver, ISO, and other styles

Singhal, Harsh. "A new framework of optimizing keyword weights in text categorization and record querying." 2008. http://hdl.rutgers.edu/1782.2/rucore10001600001.ETD.17392.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Keyword: Data Mining"

Paul, Dimple Valayil. Developing a Keyword Extractor and Document Classifier: Emerging Research and Opportunities. IGI Global, 2021.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Paul, Dimple Valayil. Developing a Keyword Extractor and Document Classifier: Emerging Research and Opportunities. IGI Global, 2021.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Paul, Dimple Valayil. Developing a Keyword Extractor and Document Classifier: Emerging Research and Opportunities. IGI Global, 2021.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Paul, Dimple Valayil. Developing a Keyword Extractor and Document Classifier: Emerging Research and Opportunities. IGI Global, 2021.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Paul, Dimple Valayil. Developing a Keyword Extractor and Document Classifier: Emerging Research and Opportunities. IGI Global, 2021.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Hilgurt, S. Ya, and O. A. Chemerys. Reconfigurable signature-based information security tools of computer systems. PH “Akademperiodyka”, 2022. http://dx.doi.org/10.15407/akademperiodyka.458.297.

Full text

Abstract:

The book is devoted to the research and development of methods for combining computational structures for reconfigurable signature-based information protection tools for computer systems and networks in order to increase their efficiency. Network security tools based, among others, on such AI-based approaches as deep neural networking, despite the great progress shown in recent years, still suffer from nonzero recognition error probability. Even a low probability of such an error in a critical infrastructure can be disastrous. Therefore, signature-based recognition methods with their theoretically exact matching feature are still relevant when creating information security systems such as network intrusion detection systems, antivirus, anti-spam, and wormcontainment systems. The real time multi-pattern string matching task has been a major performance bottleneck in such systems. To speed up the recognition process, developers use a reconfigurable hardware platform based on FPGA devices. Such platform provides almost software flexibility and near-ASIC performance. The most important component of a signature-based information security system in terms of efficiency is the recognition module, in which the multipattern matching task is directly solved. It must not only check each byte of input data at speeds of tens and hundreds of gigabits/sec against hundreds of thousand or even millions patterns of signature database, but also change its structure every time a new signature appears or the operating conditions of the protected system change. As a result of the analysis of numerous examples of the development of reconfigurable information security systems, three most promising approaches to the construction of hardware circuits of recognition modules were identified, namely, content-addressable memory based on digital comparators, Bloom filter and Aho–Corasick finite automata. A method for fast quantification of components of recognition module and the entire system was proposed. The method makes it possible to exclude resource-intensive procedures for synthesizing digital circuits on FPGAs when building complex reconfigurable information security systems and their components. To improve the efficiency of the systems under study, structural-level combinational methods are proposed, which allow combining into single recognition device several matching schemes built on different approaches and their modifications, in such a way that their advantages are enhanced and disadvantages are eliminated. In order to achieve the maximum efficiency of combining methods, optimization methods are used. The methods of: parallel combining, sequential cascading and vertical junction have been formulated and investigated. The principle of multi-level combining of combining methods is also considered and researched. Algorithms for the implementation of the proposed combining methods have been developed. Software has been created that allows to conduct experiments with the developed methods and tools. Quantitative estimates are obtained for increasing the efficiency of constructing recognition modules as a result of using combination methods. The issue of optimization of reconfigurable devices presented in hardware description languages is considered. A modification of the method of affine transformations, which allows parallelizing such cycles that cannot be optimized by other methods, was presented. In order to facilitate the practical application of the developed methods and tools, a web service using high-performance computer technologies of grid and cloud computing was considered. The proposed methods to increase efficiency of matching procedure can also be used to solve important problems in other fields of science as data mining, analysis of DNA molecules, etc. Keywords: information security, signature, multi-pattern matching, FPGA, structural combining, efficiency, optimization, hardware description language.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Keyword: Data Mining"

Wang, Dong, Lei Zou, Wanqiong Pan, and Dongyan Zhao. "Keyword Graph: Answering Keyword Search over Large Graphs." In Advanced Data Mining and Applications. Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-35527-1_53.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lorensuhewa, Aruna, Binh Pham, and Shlomo Geva. "Style Recognition Using Keyword Analysis." In Mining Multimedia and Complex Data. Springer Berlin Heidelberg, 2003. http://dx.doi.org/10.1007/978-3-540-39666-6_17.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Song, Xiaoxu, Bin Wang, Jing Sun, and Rong Pu. "TOP-R Keyword-Aware Community Search." In Advanced Data Mining and Applications. Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-65390-3_20.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Chenyang, and Leigang Dong. "Boolean Spatial Temporal Text Keyword Skyline Query." In Advanced Data Mining and Applications. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-46677-9_15.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Yun, Ziheng Wang, Jing Chen, Fei Wang, and Jiajie Xu. "Multiple Query Point Based Collective Spatial Keyword Querying." In Advanced Data Mining and Applications. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-35231-8_5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Jiang, Mengxia, Yueguo Chen, Jinchuan Chen, and Xiaoyong Du. "Interactive Predicate Suggestion for Keyword Search on RDF Graphs." In Advanced Data Mining and Applications. Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-25856-5_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Qiu, Qiang, Yang Zhang, Junping Zhu, and Wei Qu. "Building a Text Classifier by a Keyword and Wikipedia Knowledge." In Advanced Data Mining and Applications. Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-03348-3_28.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wang, Haixun, and Charu C. Aggarwal. "A Survey of Algorithms for Keyword Search on Graph Data." In Managing and Mining Graph Data. Springer US, 2010. http://dx.doi.org/10.1007/978-1-4419-6045-0_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yang, Weidong, Hao Zhu, Nan Li, and Guansheng Zhu. "Adaptive and Effective Keyword Search for XML." In Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-20841-6_35.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yang, Weidong, and Hao Zhu. "Semantic-Distance Based Clustering for XML Keyword Search." In Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-13672-6_39.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Keyword: Data Mining"

Dai, Zheng, Xinping Zhao, and Bingtao Cui. "TFIDF Text Keyword Mining Method Based on Hadoop Distributed Platform Under Massive Data." In 2024 IEEE 2nd International Conference on Image Processing and Computer Applications (ICIPCA). IEEE, 2024. http://dx.doi.org/10.1109/icipca61593.2024.10709136.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Du, Xingke, Ning Ouyang, and Xiaodong Cai. "A Text Summarization Model Based on Dual Pointer Network Fused with Keywords." In The International Conference on Data Mining, E-Learning, and Information Systems. SCITEPRESS - Science and Technology Publications, 2024. http://dx.doi.org/10.5220/0012881900004536.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wen, Yu-Ting, Kae-Jer Cho, Wen-Chih Peng, Jinyoung Yeo, and Seung-won Hwang. "KSTR: Keyword-Aware Skyline Travel Route Recommendation." In 2015 IEEE International Conference on Data Mining (ICDM). IEEE, 2015. http://dx.doi.org/10.1109/icdm.2015.37.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hu, Xinghua, and Bin Wu. "Automatic Keyword Extraction Using Linguistic Features." In 2006 6th IEEE International Conference on Data Mining Workshops. IEEE, 2006. http://dx.doi.org/10.1109/icdmw.2006.36.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Sai-Ming, Sanjeev Seereeram, Raman K. Mehra, and Chris Miles. "Context-sensitive keyword selection using text data mining." In AeroSense 2002, edited by Belur V. Dasarathy. SPIE, 2002. http://dx.doi.org/10.1117/12.460238.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wagatsuma, Takashi, Yuichi Yaguchi, and Ryuichi Oka. "Cross-Media Data Mining Using Associated Keyword Space." In 2009 Ninth IEEE International Conference on Computer and Information Technology. IEEE, 2009. http://dx.doi.org/10.1109/cit.2009.41.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Yu, Frank F. Xu, Sha Li, et al. "HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories." In 2019 IEEE International Conference on Data Mining (ICDM). IEEE, 2019. http://dx.doi.org/10.1109/icdm.2019.00098.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Joshi, Amruta, and Rajeev Motwani. "Keyword Generation for Search Engine Advertising." In Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06). IEEE, 2006. http://dx.doi.org/10.1109/icdmw.2006.104.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hossny, Ahmad Hany, and Lewis Mitchell. "Event Detection in Twitter: A Keyword Volume Approach." In 2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2018. http://dx.doi.org/10.1109/icdmw.2018.00172.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Veeramani, Hariram, Surendrabikram Thapa, and Usman Naseem. "Temporally Dynamic Session-Keyword Aware Sequential Recommendation System." In 2023 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2023. http://dx.doi.org/10.1109/icdmw60847.2023.00027.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Keyword: Data Mining"

Paynter, Robin A., Celia Fiordalisi, Elizabeth Stoeger, et al. A Prospective Comparison of Evidence Synthesis Search Strategies Developed With and Without Text-Mining Tools. Agency for Healthcare Research and Quality (AHRQ), 2021. http://dx.doi.org/10.23970/ahrqepcmethodsprospectivecomparison.

Full text

Abstract:

Background: In an era of explosive growth in biomedical evidence, improving systematic review (SR) search processes is increasingly critical. Text-mining tools (TMTs) are a potentially powerful resource to improve and streamline search strategy development. Two types of TMTs are especially of interest to searchers: word frequency (useful for identifying most used keyword terms, e.g., PubReminer) and clustering (visualizing common themes, e.g., Carrot2). Objectives: The objectives of this study were to compare the benefits and trade-offs of searches with and without the use of TMTs for evidence synthesis products in real world settings. Specific questions included: (1) Do TMTs decrease the time spent developing search strategies? (2) How do TMTs affect the sensitivity and yield of searches? (3) Do TMTs identify groups of records that can be safely excluded in the search evaluation step? (4) Does the complexity of a systematic review topic affect TMT performance? In addition to quantitative data, we collected librarians' comments on their experiences using TMTs to explore when and how these new tools may be useful in systematic review search¬¬ creation. Methods: In this prospective comparative study, we included seven SR projects, and classified them into simple or complex topics. The project librarian used conventional “usual practice” (UP) methods to create the MEDLINE search strategy, while a paired TMT librarian simultaneously and independently created a search strategy using a variety of TMTs. TMT librarians could choose one or more freely available TMTs per category from a pre-selected list in each of three categories: (1) keyword/phrase tools: AntConc, PubReMiner; (2) subject term tools: MeSH on Demand, PubReMiner, Yale MeSH Analyzer; and (3) strategy evaluation tools: Carrot2, VOSviewer. We collected results from both MEDLINE searches (with and without TMTs), coded every citation’s origin (UP or TMT respectively), deduplicated them, and then sent the citation library to the review team for screening. When the draft report was submitted, we used the final list of included citations to calculate the sensitivity, precision, and number-needed-to-read for each search (with and without TMTs). Separately, we tracked the time spent on various aspects of search creation by each librarian. Simple and complex topics were analyzed separately to provide insight into whether TMTs could be more useful for one type of topic or another. Results: Across all reviews, UP searches seemed to perform better than TMT, but because of the small sample size, none of these differences was statistically significant. UP searches were slightly more sensitive (92% [95% confidence intervals (CI) 85–99%]) than TMT searches (84.9% [95% CI 74.4–95.4%]). The mean number-needed-to-read was 83 (SD 34) for UP and 90 (SD 68) for TMT. Keyword and subject term development using TMTs generally took less time than those developed using UP alone. The average total time was 12 hours (SD 8) to create a complete search strategy by UP librarians, and 5 hours (SD 2) for the TMT librarians. TMTs neither affected search evaluation time nor improved identification of exclusion concepts (irrelevant records) that can be safely removed from the search set. Conclusion: Across all reviews but one, TMT searches were less sensitive than UP searches. For simple SR topics (i.e., single indication–single drug), TMT searches were slightly less sensitive, but reduced time spent in search design. For complex SR topics (e.g., multicomponent interventions), TMT searches were less sensitive than UP searches; nevertheless, in complex reviews, they identified unique eligible citations not found by the UP searches. TMT searches also reduced time spent in search strategy development. For all evidence synthesis types, TMT searches may be more efficient in reviews where comprehensiveness is not paramount, or as an adjunct to UP for evidence syntheses, because they can identify unique includable citations. If TMTs were easier to learn and use, their utility would be increased.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Keyword: Data Mining'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Keyword: Data Mining"

Dissertations / Theses on the topic "Keyword: Data Mining"

Books on the topic "Keyword: Data Mining"

Book chapters on the topic "Keyword: Data Mining"

Conference papers on the topic "Keyword: Data Mining"

Reports on the topic "Keyword: Data Mining"