To see the other types of publications on this topic, follow the link: Heterogeneous Textual Data Mining.

Journal articles on the topic 'Heterogeneous Textual Data Mining'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Heterogeneous Textual Data Mining.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Ashwini Brahme. "Association Rule Mining and Information Retrieval Using Stemming and Text Mining Techniques." Journal of Information Systems Engineering and Management 10, no. 18s (March 11, 2025): 622–28. https://doi.org/10.52783/jisem.v10i18s.2958.

Full text
Abstract:
Heterogeneous, complex and enormous data mining plays significant role in the today’s big data scenario all over the globe. The research paper is intended toward the natural language processing, mining of textual data, and pattern discovery through association rule mining. The research is aimed towards mining of digital news of epidemic diseases and generating the hidden patterns from the corpus data. The present study also aimed towards developing knowledge discovery system for healthcare for prediction of epidemic viral diseases and their related measures which will be helpful for the healthcare experts, doctors, and healthcare organizations as well as for governments also to take the precautionary measures. The study deigned for predictive analytics of epidemic diseases and their patterns using association rule mining. The precautionary measures for the healthcare and highly impacted geographical location of widespread diseases are generated through the proposed system.
APA, Harvard, Vancouver, ISO, and other styles
2

Ali, Wajid, Wanli Zuo, Rahman Ali, Xianglin Zuo, and Gohar Rahman. "Causality Mining in Natural Languages Using Machine and Deep Learning Techniques: A Survey." Applied Sciences 11, no. 21 (October 27, 2021): 10064. http://dx.doi.org/10.3390/app112110064.

Full text
Abstract:
The era of big textual corpora and machine learning technologies have paved the way for researchers in numerous data mining fields. Among them, causality mining (CM) from textual data has become a significant area of concern and has more attention from researchers. Causality (cause-effect relations) serves as an essential category of relationships, which plays a significant role in question answering, future events predication, discourse comprehension, decision making, future scenario generation, medical text mining, behavior prediction, and textual prediction entailment. While, decades of development techniques for CM are still prone to performance enhancement, especially for ambiguous and implicitly expressed causalities. The ineffectiveness of the early attempts is mainly due to small, ambiguous, heterogeneous, and domain-specific datasets constructed by manually linguistic and syntactic rules. Many researchers have deployed shallow machine learning (ML) and deep learning (DL) techniques to deal with such datasets, and they achieved satisfactory performance. In this survey, an effort has been made to address a comprehensive review of some state-of-the-art shallow ML and DL approaches in CM. We present a detailed taxonomy of CM and discuss popular ML and DL approaches with their comparative weaknesses and strengths, applications, popular datasets, and frameworks. Lastly, the future research challenges are discussed with illustrations of how to transform them into productive future research directions.
APA, Harvard, Vancouver, ISO, and other styles
3

Makarevich, T. I. "Intellectual Analysis of Textual Information in Domain Fields in the System of e-Government." Digital Transformation, no. 2 (August 6, 2019): 46–52. http://dx.doi.org/10.38086/2522-9613-2019-2-46-52.

Full text
Abstract:
The given paper considers application of data mining technology in scientific research as one of intellectual analysis methods in the domain field of e-Government. The topicality of the issue is stipulated by the current absence of the researches of the kind in the Republic of Belarus. The paper illustrates how the programme package Rapid Miner and the language R have been applied in text mining. Concept indexing has been admitted as the most resultative form of analyzing domain field ontologies. Formal and linguistic approaches are found most effective in analyzing domain field ontologies. The paper identifies the problems of word redundancy and word polysemy. The prognosis for the further research investigation is in interconnectivity of specialized ontologies studying heterogeneous terms on the basis of artificial intelligence (AI).
APA, Harvard, Vancouver, ISO, and other styles
4

Dérozier, Sandra, Robert Bossy, Louise Deléger, Mouhamadou Ba, Estelle Chaix, Olivier Harlé, Valentin Loux, Hélène Falentin, and Claire Nédellec. "Omnicrobe, an open-access database of microbial habitats and phenotypes using a comprehensive text mining and data fusion approach." PLOS ONE 18, no. 1 (January 20, 2023): e0272473. http://dx.doi.org/10.1371/journal.pone.0272473.

Full text
Abstract:
The dramatic increase in the number of microbe descriptions in databases, reports, and papers presents a two-fold challenge for accessing the information: integration of heterogeneous data in a standard ontology-based representation and normalization of the textual descriptions by semantic analysis. Recent text mining methods offer powerful ways to extract textual information and generate ontology-based representation. This paper describes the design of the Omnicrobe application that gathers comprehensive information on habitats, phenotypes, and usages of microbes from scientific sources of high interest to the microbiology community. The Omnicrobe database contains around 1 million descriptions of microbe properties. These descriptions are created by analyzing and combining six information sources of various kinds, i.e. biological resource catalogs, sequence databases and scientific literature. The microbe properties are indexed by the Ontobiotope ontology and their taxa are indexed by an extended version of the taxonomy maintained by the National Center for Biotechnology Information. The Omnicrobe application covers all domains of microbiology. With simple or rich ontology-based queries, it provides easy-to-use support in the resolution of scientific questions related to the habitats, phenotypes, and uses of microbes. We illustrate the potential of Omnicrobe with a use case from the food innovation domain.
APA, Harvard, Vancouver, ISO, and other styles
5

Farimani, Saeede Anbaee, Majid Vafaei Jahan, and Amin Milani Fard. "From Text Representation to Financial Market Prediction: A Literature Review." Information 13, no. 10 (September 29, 2022): 466. http://dx.doi.org/10.3390/info13100466.

Full text
Abstract:
News dissemination in social media causes fluctuations in financial markets. (Scope) Recent advanced methods in deep learning-based natural language processing have shown promising results in financial market analysis. However, understanding how to leverage large amounts of textual data alongside financial market information is important for the investors’ behavior analysis. In this study, we review over 150 publications in the field of behavioral finance that jointly investigated natural language processing (NLP) approaches and a market data analysis for financial decision support. This work differs from other reviews by focusing on applied publications in computer science and artificial intelligence that contributed to a heterogeneous information fusion for the investors’ behavior analysis. (Goal) We study various text representation methods, sentiment analysis, and information retrieval methods from heterogeneous data sources. (Findings) We present current and future research directions in text mining and deep learning for correlation analysis, forecasting, and recommendation systems in financial markets, such as stocks, cryptocurrencies, and Forex (Foreign Exchange Market).
APA, Harvard, Vancouver, ISO, and other styles
6

Tan, Weiyan. "ESG Performance Prediction and Driver Factor Mining for Listed Companies Based on Machine Learning: A Multi-Source Heterogeneous Data Fusion Analysis." Science, Technology and Social Development Proceedings Series 1 (March 21, 2025): 349–56. https://doi.org/10.70088/tmzjct41.

Full text
Abstract:
With the acceleration of global economic integration and the growing focus on sustainable development, Environmental, Social, and Governance (ESG) factors have become key standards for evaluating a company's long-term value and risk. However, accurately measuring the ESG performance of listed companies and identifying the underlying driving factors remains a significant challenge. This paper proposes a Transformer-based multi-source heterogeneous data fusion model, MSformer, which analyzes diverse data, including financial reports, news, social media comments, and government announcements. It categorizes the data into three types: time-series structured data, time-series structured mapped data, and textual data. The model enhances feature extraction using the Spatial Frequency-coordinated Attention Mechanism (SFHA) and employs Support Vector Regression (SVR) for prediction. Experimental results show that MSformer outperforms other advanced models, achieving an outstanding 87.4% multi-class accuracy and 0.517 average prediction error, proving its effectiveness and advantage in ESG prediction.
APA, Harvard, Vancouver, ISO, and other styles
7

Mikhnenko, Pavel. "Transformation of the largest Russian companies’ business vocabulary in annual reports: Data Mining." Upravlenets 13, no. 5 (November 3, 2022): 17–33. http://dx.doi.org/10.29141/2218-5003-2022-13-5-2.

Full text
Abstract:
One of the promising areas of business analysis is the development of new methods and tools for accounting of nonfinancial and non-numeric information. There is a significant number of theoretical and practical solutions in this field; however, the issues of the transformation dynamics of companies’ business vocabulary need to be studied more extensively. The article aims to identify and interpret latent information reflecting strategic guidelines and conditions for the economic development of Russian enterprises. The methodology of the study is based on the concepts of narrative economics and multimodal business analytics, which is a system of scientific-practical methods for analyzing the activities of economic entities through the use of data from heterogeneous sources. The Data Mining methods and tools for analyzing and systematizing large volumes of textual information were used. The data for research were retrieved from the annual reports of the largest Russian companies for 2018–2020. Among the main indicators of the business vocabulary transformation considered in the paper are the occurrence of unique key tokens (UKTs) and the dynamics of its change, as well as the main contexts of UKTs relevant to the problem of development. The findings indicate noticeable changes in the vocabulary of Russian companies’ annual reports, such as a decline in covering formal aspects of economic activity and a growing debate on the development in the presence of risk. It is shown that these trends were most clearly manifested in the reports of metallurgical and energy enterprises. The research results can serve as a basis for enhancing the analytical and predictive effectiveness of modern business analysis
APA, Harvard, Vancouver, ISO, and other styles
8

Peng, Hao, Jianxin Li, Yangqiu Song, Renyu Yang, Rajiv Ranjan, Philip S. Yu, and Lifang He. "Streaming Social Event Detection and Evolution Discovery in Heterogeneous Information Networks." ACM Transactions on Knowledge Discovery from Data 15, no. 5 (June 26, 2021): 1–33. http://dx.doi.org/10.1145/3447585.

Full text
Abstract:
Events are happening in real world and real time, which can be planned and organized for occasions, such as social gatherings, festival celebrations, influential meetings, or sports activities. Social media platforms generate a lot of real-time text information regarding public events with different topics. However, mining social events is challenging because events typically exhibit heterogeneous texture and metadata are often ambiguous. In this article, we first design a novel event-based meta-schema to characterize the semantic relatedness of social events and then build an event-based heterogeneous information network (HIN) integrating information from external knowledge base. Second, we propose a novel Pairwise Popularity Graph Convolutional Network, named as PP-GCN, based on weighted meta-path instance similarity and textual semantic representation as inputs, to perform fine-grained social event categorization and learn the optimal weights of meta-paths in different tasks. Third, we propose a streaming social event detection and evolution discovery framework for HINs based on meta-path similarity search, historical information about meta-paths, and heterogeneous DBSCAN clustering method. Comprehensive experiments on real-world streaming social text data are conducted to compare various social event detection and evolution discovery algorithms. Experimental results demonstrate that our proposed framework outperforms other alternative social event detection and evolution discovery techniques.
APA, Harvard, Vancouver, ISO, and other styles
9

Huang, Ru, Zijian Chen, Jianhua He, and Xiaoli Chu. "Dynamic Heterogeneous User Generated Contents-Driven Relation Assessment via Graph Representation Learning." Sensors 22, no. 4 (February 11, 2022): 1402. http://dx.doi.org/10.3390/s22041402.

Full text
Abstract:
Cross-domain decision-making systems are suffering a huge challenge with the rapidly emerging uneven quality of user-generated data, which poses a heavy responsibility to online platforms. Current content analysis methods primarily concentrate on non-textual contents, such as images and videos themselves, while ignoring the interrelationship between each user post’s contents. In this paper, we propose a novel framework named community-aware dynamic heterogeneous graph embedding (CDHNE) for relationship assessment, capable of mining heterogeneous information, latent community structure and dynamic characteristics from user-generated contents (UGC), which aims to solve complex non-euclidean structured problems. Specifically, we introduce the Markov-chain-based metapath to extract heterogeneous contents and semantics in UGC. A edge-centric attention mechanism is elaborated for localized feature aggregation. Thereafter, we obtain the node representations from micro perspective and apply it to the discovery of global structure by a clustering technique. In order to uncover the temporal evolutionary patterns, we devise an encoder–decoder structure, containing multiple recurrent memory units, which helps to capture the dynamics for relation assessment efficiently and effectively. Extensive experiments on four real-world datasets are conducted in this work, which demonstrate that CDHNE outperforms other baselines due to the comprehensive node representation, while also exhibiting the superiority of CDHNE in relation assessment. The proposed model is presented as a method of breaking down the barriers between traditional UGC analysis and their abstract network analysis.
APA, Harvard, Vancouver, ISO, and other styles
10

Williams, Lowri, Eirini Anthi, Laura Arman, and Pete Burnap. "Topic Modelling: Going beyond Token Outputs." Big Data and Cognitive Computing 8, no. 5 (April 25, 2024): 44. http://dx.doi.org/10.3390/bdcc8050044.

Full text
Abstract:
Topic modelling is a text mining technique for identifying salient themes from a number of documents. The output is commonly a set of topics consisting of isolated tokens that often co-occur in such documents. Manual effort is often associated with interpreting a topic’s description from such tokens. However, from a human’s perspective, such outputs may not adequately provide enough information to infer the meaning of the topics; thus, their interpretability is often inaccurately understood. Although several studies have attempted to automatically extend topic descriptions as a means of enhancing the interpretation of topic models, they rely on external language sources that may become unavailable, must be kept up to date to generate relevant results, and present privacy issues when training on or processing data. This paper presents a novel approach towards extending the output of traditional topic modelling methods beyond a list of isolated tokens. This approach removes the dependence on external sources by using the textual data themselves by extracting high-scoring keywords and mapping them to the topic model’s token outputs. To compare how the proposed method benchmarks against the state of the art, a comparative analysis against results produced by Large Language Models (LLMs) is presented. Such results report that the proposed method resonates with the thematic coverage found in LLMs and often surpasses such models by bridging the gap between broad thematic elements and granular details. In addition, to demonstrate and reinforce the generalisation of the proposed method, the approach was further evaluated using two other topic modelling methods as the underlying models and when using a heterogeneous unseen dataset. To measure the interpretability of the proposed outputs against those of the traditional topic modelling approach, independent annotators manually scored each output based on their quality and usefulness as well as the efficiency of the annotation task. The proposed approach demonstrated higher quality and usefulness, as well as higher efficiency in the annotation task, in comparison to the outputs of a traditional topic modelling method, demonstrating an increase in their interpretability.
APA, Harvard, Vancouver, ISO, and other styles
11

Yasuda, Akio. "Reviewing "Text Mining": Textual Data Mining." IEEJ Transactions on Electronics, Information and Systems 125, no. 5 (2005): 682–89. http://dx.doi.org/10.1541/ieejeiss.125.682.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Yassir, Ali Hameed, Ali A. Mohammed, Adel Abdul-Jabbar Alkhazraji, Mustafa Emad Hameed, Mohammed Saad Talib, and Mohanad Faeq Ali. "Sentimental classification analysis of polarity multi-view textual data using data mining techniques." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 5 (October 1, 2020): 5526. http://dx.doi.org/10.11591/ijece.v10i5.pp5526-5534.

Full text
Abstract:
The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics.
APA, Harvard, Vancouver, ISO, and other styles
13

Eltaher, Mohammed, and Jeongkyu Lee. "Social User Mining." International Journal of Multimedia Data Engineering and Management 4, no. 4 (October 2013): 58–70. http://dx.doi.org/10.4018/ijmdem.2013100104.

Full text
Abstract:
In recent years, the pervasive use of social media has generated huge amounts of data that starts to gain a lot of attentions. Each social media source utilizes different data types such as textual and visual. For example, Twitter1 is for a short text message, Flickr2 is for images and videos, and Facebook3 allows all of these data types. It is highly desired to find patterns of social media users from such different data formats. With the use of data mining techniques, the social media data opens a lot of opportunities for researchers. Despite of its short history, social media mining has become very active research area. This paper provides a comprehensive survey on recent research on social user mining. In particular, the survey focuses on two aspects: (1) social user mining based on data types, such as textual, visual, and both textual and visual information, and (2) social user mining based on mining techniques. In addition, we present our current research on social user mining as well as its future directions.
APA, Harvard, Vancouver, ISO, and other styles
14

Wang, Xinke, Kai Li, and Xiaoling Li. "Heterogeneous High Performance Data Mining System for Intelligent Data." Scalable Computing: Practice and Experience 25, no. 4 (June 16, 2024): 2636–44. http://dx.doi.org/10.12694/scpe.v25i4.2927.

Full text
Abstract:
In order to improve the utilization rate of internet data under heterogeneous distribution, increase the diversified usage functions and data transmission rate of the internet, and reduce the running time of the internet, it is necessary to mine internet data under heterogeneous distribution. The author proposes an ontology based optimization method for internet data mining under heterogeneous distribution; This method first preprocesses and selects data features from internet data under heterogeneous distribution, and uses a feature selection decision system to select features from the mining data. Based on this, information entropy is used to filter internet data under heterogeneous distribution. During the filtering process, the theoretical values filtered by information entropy are reduced to obtain the optimal data filtering value, finally, based on the various data information obtained in the preprocessing, the iterative calculation results of the information gain value in the decision tree generation algorithm are used to high-precision mine internet data under heterogeneous distribution; The simulation experimental results demonstrate that theproposed method improves the flexibility of internet data operations under heterogeneous distribution, increases the recyclability of internet data, and makes internet operations under heterogeneous distribution more concise and efficient, providing a strong basis for research and development in this field.
APA, Harvard, Vancouver, ISO, and other styles
15

Zhang, Jingjing, and Yang Chi. "Data Management and Service Mode of Library Based on Data Mining Algorithm." Scientific Programming 2022 (September 21, 2022): 1–12. http://dx.doi.org/10.1155/2022/2414830.

Full text
Abstract:
Data management for large-scale data library services with mining procedures improves the availability and readiness of heterogeneous sources. The heterogeneous data sources are assimilated as a single entity through mining procedures to meet the data demands. This article introduces connectivity-persistent data mining method (CDMM) to improve the data handling precision with boosting availability. The proposed method relies on federated learning for identifying the service demands, thereby providing data mining. The learning paradigm accumulates information on shared data library existence over various services. Based on the availability, further mining demands are forwarded to the data management system. If the existence verified by the federated learning is adaptable, then sharing-enabled mining is endorsed for the connected users. The data management then augments several heterogeneous shared libraries to meet the mining requirements. This process is reversible based on the service mode and existence. Therefore, the proposed method improves data availability with less mining and access time and fewer failures.
APA, Harvard, Vancouver, ISO, and other styles
16

Jayasudha, J., and A. Christina Esther. "Mining Sequential Pattern of Data in Textual Document Using Data Mining Classification Technique." Asian Journal of Computer Science and Technology 8, S1 (February 5, 2019): 41–45. http://dx.doi.org/10.51983/ajcst-2019.8.s1.1961.

Full text
Abstract:
Text document were transmitted over the internet for the text communication. So they were occurred many problems like repeated text occurred because of same data were provided in the internet. To characterize and extracting that is a most critical task for the researchers. Many researchers were characterized and applied in many fields like real-life scenarios, such as real-time monitoring on abnormal user behaviors, etc. In this case to detect and characterize the personalized behavior of the user were provide some drawbacks. To solve this problem, this paper analyzing the sequential data and characterize the user behavior with the help of the data mining sequential pattern matching algorithm.
APA, Harvard, Vancouver, ISO, and other styles
17

Siron, Chris R., John F. H. Thompson, Tim Baker, Robert Darling, and Gregory Dipple. "Origin of Au-Rich Carbonate-Hosted Replacement Deposits of the Kassandra Mining District, Northern Greece: Evidence for Late Oligocene, Structurally Controlled, and Zoned Hydrothermal Systems." Economic Geology 114, no. 7 (November 1, 2019): 1389–414. http://dx.doi.org/10.5382/econgeo.4664.

Full text
Abstract:
Abstract The Au-rich polymetallic massive sulfide orebodies of the Kassandra mining district belong to the intrusion-related carbonate-hosted replacement deposit class. Marble lenses contained within the Stratoni fault zone host the Madem Lakkos and Mavres Petres deposits at the eastern end of the fault system, where paragenetically early skarn and massive sulfide are spatially associated with late Oligocene aplitic and porphyritic dikes. Skarn transitions into predominant massive and banded replacement sulfide bodies, which are overprinted by a younger assemblage of boulangerite-bearing, quartz-rich sulfide and late quartz-rhodochrosite vein breccias. The latter style of mineralization is most abundant at the Piavitsa prospect at the western end of the exposed fault system. The sulfide orebodies at the Olympias deposit are hosted by marble in association with the Kassandra fault, where textural and mineralogical similarities to the sulfide bodies within the Stratoni fault zone suggest a genetic relationship. Estimated trapping temperatures and pressures based on fluid inclusion data indicate that carbonate replacement mineralization took place at depths less than about 5.9 km. Carbon and oxygen isotope patterns in carbonate from the Stratoni fault zone support isotopic exchange principally through fluid–wall-rock interaction, whereas decarbonation and fluid-rock exchange reactions were important at the Olympias deposit. Carbonate minerals associated with skarn and replacement sulfide throughout the district have isotopic compositions that are consistent with formation from a hydrothermal fluid of magmatic origin. Lower homogenization temperatures and salinities in the younger quartz-rich sulfide assemblage and quartz-rhodochrosite vein breccias, together with low δ18O values of gangue carbonate, suggest dilution of a primary magmatic fluid with meteoric water late in the evolution of the hydrothermal system in both the Olympias area and the Stratoni fault zone. The replacement sulfide orebodies in the district likely inherited their uniform Pb isotope composition from a late Oligocene igneous source and the isotopically heterogeneous metamorphic basement units. Metal distribution patterns at the scale of the Stratoni fault zone show diminishing Cu concentration with decreasing Pb/Zn and Ag/Au ratios from Madem Lakkos to Mavres Petres and the Piavitsa prospect in the west. The sulfide orebodies at the Olympias deposit exhibit elevated Cu values in the east with increasing Pb/Zn and Ag/Au ratios down-plunge to the south-southwest. Metal concentration and ratios support zoning related to temperature and solubility changes with increasing distance from a probable magmatic source. Structural and igneous relationships, together with fluid inclusion microthermometric and carbon-oxygen isotope data and metal distribution patterns, are supportive of a zoned hydrothermal system that exceeded 12 km along the Stratoni fault zone, sourced by an igneous intrusion to the southeast of the Madem Lakkos deposit. The Olympias replacement sulfide orebodies, associated with the Kassandra fault, resulted from a local hydrothermal system that was likely derived from a concealed igneous intrusion to the east of the deposit.
APA, Harvard, Vancouver, ISO, and other styles
18

Guo, Hongyan, and Xintao Li. "Multisource Target Data Fusion Tracking Method for Heterogeneous Network Based on Data Mining." Wireless Communications and Mobile Computing 2022 (June 10, 2022): 1–10. http://dx.doi.org/10.1155/2022/9291319.

Full text
Abstract:
This research is on heterogeneous network fusion method of multisource target data based on data mining. Firstly, it is a distributed storage structure model for building heterogeneous network multisource target data. Then, using the phase space reconstruction method, a grid distribution structure model for data fusion tracking is constructed, and realize visual scheduling and automatic monitoring of multisource target data. Finally, according to the feature extraction results, analyze the statistical characteristics of multisource target data in heterogeneous networks, combined with the fuzzy tomographic analysis method, multilevel fusion, and adaptive mining of multisource target data, extract the associated feature quantities in it, and realize the fusion tracking of data. The simulation results show that, in relatively simple heterogeneous networks, the feature mining error of the proposed method is nearly 2.11% lower than the two traditional methods. In relatively complex heterogeneous networks, the feature mining error of the proposed method is nearly 6.48% lower than the two traditional methods. It can be seen that this method has better adaptability for fusion tracking of heterogeneous network multisource target data, the anti-interference ability is strong, and the tracking accuracy in the data fusion tracking process is also improved.
APA, Harvard, Vancouver, ISO, and other styles
19

Chen, Pei Bin, Lan Hu, Hui Yang, Xiang Feng Xue, Chuan Xu Liu, and Xin Jian Li. "Target Value Analysis Based on Data Mining Technology." Applied Mechanics and Materials 602-605 (August 2014): 3096–99. http://dx.doi.org/10.4028/www.scientific.net/amm.602-605.3096.

Full text
Abstract:
In this paper, the data mining technology and the mining process was explained; and several common methods of data mining were described. Based on the characteristics of the target value, application of text classification and textual association in the target value mining were discussed, and the process model of data mining concerning target value was also expressed.
APA, Harvard, Vancouver, ISO, and other styles
20

HOLZMAN, LARS E., TODD A. FISHER, LEON M. GALITSKY, APRIL KONTOSTATHIS, and WILLIAM M. POTTENGER. "A SOFTWARE INFRASTRUCTURE FOR RESEARCH IN TEXTUAL DATA MINING." International Journal on Artificial Intelligence Tools 13, no. 04 (December 2004): 829–49. http://dx.doi.org/10.1142/s0218213004001843.

Full text
Abstract:
Few tools exist that address the challenges facing researchers in the Textual Data Mining (TDM) field. Some are too specific to their application, or are prototypes not suitable for general use. More general tools often are not capable of processing large volumes of data. We have created a Textual Data Mining Infrastructure (TMI) that incorporates both existing and new capabilities in a reusable framework conducive to developing new tools and components. TMI adheres to strict guidelines that allow it to run in a wide range of processing environments – as a result, it accommodates the volume of computing and diversity of research occurring in TDM. A unique capability of TMI is support for optimization. This facilitates text mining research by automating the search for optimal parameters in text mining algorithms. In this article we describe a number of applications that use the TMI. A brief tutorial is provided on the use of TMI. We present several novel results that have not been published elsewhere. We also discuss how the TMI utilizes existing machine-learning libraries, thereby enabling researchers to continue and extend their endeavors with minimal effort. Towards that end, TMI is available on the web at .
APA, Harvard, Vancouver, ISO, and other styles
21

Ur-Rahman, Nadeem. "Textual Data Mining For Knowledge Discovery and Data Classification: A Comparative Study." European Scientific Journal, ESJ 13, no. 21 (July 31, 2017): 429. http://dx.doi.org/10.19044/esj.2017.v13n21p429.

Full text
Abstract:
Business Intelligence solutions are key to enable industrial organisations (either manufacturing or construction) to remain competitive in the market. These solutions are achieved through analysis of data which is collected, retrieved and re-used for prediction and classification purposes. However many sources of industrial data are not being fully utilised to improve the business processes of the associated industry. It is generally left to the decision makers or managers within a company to take effective decisions based on the information available throughout product design and manufacture or from the operation of business or production processes. Substantial efforts and energy are required in terms of time and money to identify and exploit the appropriate information that is available from the data. Data Mining techniques have long been applied mainly to numerical forms of data available from various data sources but their applications to analyse semi-structured or unstructured databases are still limited to a few specific domains. The applications of these techniques in combination with Text Mining methods based on statistical, natural language processing and visualisation techniques could give beneficial results. Text Mining methods mainly deal with document clustering, text summarisation and classification and mainly rely on methods and techniques available in the area of Information Retrieval (IR). These help to uncover the hidden information in text documents at an initial level. This paper investigates applications of Text Mining in terms of Textual Data Mining (TDM) methods which share techniques from IR and data mining. These techniques may be implemented to analyse textual databases in general but they are demonstrated here using examples of Post Project Reviews (PPR) from the construction industry as a case study. The research is focused on finding key single or multiple term phrases for classifying the documents into two classes i.e. good information and bad information documents to help decision makers or project managers to identify key issues discussed in PPRs which can be used as a guide for future project management process.
APA, Harvard, Vancouver, ISO, and other styles
22

Calado, João Eudes de Souza, José Matias-Pereira, and Abimael de Jesus Barros Costa. "Orange Data Mining." Revista do TCU 154 (December 4, 2024): 172–93. https://doi.org/10.69518/rtcu.154.172-193.

Full text
Abstract:
O objetivo do estudo é analisar, sob o enfoque da mineração de dados, as informações do Relato Integrado de Gestão (RIG) em algumas Unidades Prestadoras de Contas (UPCs) brasileiras por meio da ferramenta Orange Data Mining (ODM). Para tanto, foi realizado um estudo qualitativo, documental e exploratório por meio de práticas de análise textual de dados financeiros e não financeiros do RIG de quinze universidades federais brasileiras. São apresentados dois exemplos de análises, com foco em um único exercício, 2019, o que poderá ser mais explorado em estudos futuros, considerando-se a expectativa de adoção da Estrutura Internacional do Relato Integrado (EIRI) pelas UPCs nos Estados e Municípios. Os resultados da abordagem prática na execução da ODM demonstraram que as características intuitivas da ferramenta podem facilitar as análises por diversos perfis de pesquisadores, do iniciante ao mais experiente, e que se configura como oportunidade para diferentes análises dos referidos dados do RIG. As contribuições da pesquisa poderão integrar a literatura sobre o uso da ODM no setor público, no estímulo a estudantes, servidores públicos, inclusive os do controle, na prática da avaliação das informações do RIG, sobretudo no controle social e na transparência.
APA, Harvard, Vancouver, ISO, and other styles
23

Davahli, Mohammad Reza, Waldemar Karwowski, Edgar Gutierrez, Krzysztof Fiok, Grzegorz Wróbel, Redha Taiar, and Tareq Ahram. "Identification and Prediction of Human Behavior through Mining of Unstructured Textual Data." Symmetry 12, no. 11 (November 19, 2020): 1902. http://dx.doi.org/10.3390/sym12111902.

Full text
Abstract:
The identification of human behavior can provide useful information across multiple job spectra. Recent advances in applying data-based approaches to social sciences have increased the feasibility of modeling human behavior. In particular, studying human behavior by analyzing unstructured textual data has recently received considerable attention because of the abundance of textual data. The main objective of the present study was to discuss the primary methods for identifying and predicting human behavior through the mining of unstructured textual data. Of the 823 articles analyzed, 87 met the predefined inclusion criteria and were included in the literature review. Our results show that the included articles could be symmetrically classified into two groups. The first group of articles attempted to identify the leading indicators of human behavior in unstructured textual data. In this group, the data-based approaches had three main components: (1) collecting self-reported survey data, (2) collecting data from social media and extracting data features, and (3) applying correlation analysis to evaluate the relationship between two sets of data. In contrast, the second group focused on the accuracy of data-based approaches for predicting human behavior. In this group, the data-based approaches could be categorized into (1) approaches based on labeled unstructured textual data and (2) approaches based on unlabeled unstructured textual data. The review provides a comprehensive insight into unstructured textual data mining to identify and predict human behavior and personality traits.
APA, Harvard, Vancouver, ISO, and other styles
24

Fize, Jacques, Mathieu Roche, and Maguelonne Teisseire. "Could spatial features help the matching of textual data?" Intelligent Data Analysis 24, no. 5 (September 30, 2020): 1043–64. http://dx.doi.org/10.3233/ida-194749.

Full text
Abstract:
Textual data is available to an increasing extent through different media (social networks, companies data, data catalogues, etc.). New information extraction methods are needed since these new resources are highly heterogeneous. In this article, we propose a text matching process based on spatial features and assessed through heterogeneous textual data. Besides being compatible with heterogeneous data, it comprises two contributions: first, spatial information is extracted for comparison purposes and subsequently stored in a dedicated spatial textual representation (STR); and then two transformations are applied on STR to improve the spatial similarity estimation. This article outlines the proposed approach with new contributions: (i) a new geocoding methods using general co-occurrences between entities, and (ii) a thorough evaluation followed by (iii) an in-depth discussion. The results obtained on two corpora demonstrate that good spatial matches (≈ 80% precision on major criteria) can be obtained between the most similar STRs with further enhancement achieved via STR transformation.
APA, Harvard, Vancouver, ISO, and other styles
25

Rana, Deepak Singh. "Generating Document Summary using Data Mining and Clustering Techniques." Mathematical Statistician and Engineering Applications 70, no. 1 (January 31, 2021): 285–92. http://dx.doi.org/10.17762/msea.v70i1.2310.

Full text
Abstract:
Abstract This paper presents a novel approach to generating document summaries using data mining and clustering techniques, specifically K-means clustering and bisecting K-means clustering algorithms. With the exponential growth of textual data, there is an increasing need for efficient and accurate summarization techniques to aid users in understanding the key information within large collections of documents. This study explores the potential of data mining and clustering methods in extracting salient features from textual data and producing high-quality summaries. By applying K-means clustering and bisecting K-means clustering algorithms to the preprocessed textual data, the proposed approach groups similar sentences together and selects the most representative sentences from each cluster to form the final summary. The performance of the proposed method is evaluated using standard evaluation metrics, such as precision, recall, and F1-score, and compared with existing summarization techniques. The results demonstrate that the combination of data mining and clustering techniques provides a promising solution for generating accurate and concise document summaries, with potential applications in various domains, such as news aggregation, scientific literature summarization, and social media content analysis.
APA, Harvard, Vancouver, ISO, and other styles
26

Wang, Yong Jiao. "Multi Dimension Knowledge Mining in Heterogeneous Data Resources." Advanced Materials Research 433-440 (January 2012): 5256–62. http://dx.doi.org/10.4028/www.scientific.net/amr.433-440.5256.

Full text
Abstract:
With study on heterogeneous network environment and data source object, this paper has explored a variety of data in the feasibility knowledge discovery in accordance with the model. The method has been achieved on the data validation of ideas in how to effectively use the network data mining resources and the access to potentially obtain the valuable domain knowledge. The main research activities include: 1.presenting the domain knowledge mining model on the network environment. 2. Presenting a new model of the probability of topic: Topic- Author model. 3. Presenting a Blog knowledge framework with the analysis and diffusion of ideas on the theme of mining, the results shows that the field of knowledge is proposed the mining method, which is able to find a large number of valuable, potentially multi-dimensional knowledge and application of knowledge. Therefore, it can provide the users with a variety of services and support the information age, knowledge acquisition and learning.
APA, Harvard, Vancouver, ISO, and other styles
27

Huang, Haiyang, and Zhanlei Shang. "Fast mining method of network heterogeneous fault tolerant data based on K-means clustering." Web Intelligence 19, no. 1-2 (December 3, 2021): 115–24. http://dx.doi.org/10.3233/web-210460.

Full text
Abstract:
In the traditional network heterogeneous fault-tolerant data mining process, there are some problems such as low accuracy and slow speed. This paper proposes a fast mining method based on K-means clustering for network heterogeneous fault-tolerant data. The confidence space of heterogeneous fault-tolerant data is determined, and the range of motion of fault-tolerant data is obtained; Singular value decomposition (SVD) method is used to construct the classified data model to obtain the characteristics of heterogeneous fault-tolerant data; The redundant data in fault-tolerant data is deleted by unsupervised feature selection algorithm, and the square sum and Euclidean distance of fault-tolerant data clustering center are determined by K-means algorithm. The discrete data clustering space is constructed, and the objective optimal function of network heterogeneous fault-tolerant data clustering is obtained, Realize fault-tolerant data fast mining. The results show that the mining accuracy of the proposed method can reach 97%.
APA, Harvard, Vancouver, ISO, and other styles
28

Radinsky, K., S. Davidovich, and S. Markovitch. "Learning to Predict from Textual Data." Journal of Artificial Intelligence Research 45 (December 26, 2012): 641–84. http://dx.doi.org/10.1613/jair.3865.

Full text
Abstract:
Given a current news event, we tackle the problem of generating plausible predictions of future events it might cause. We present a new methodology for modeling and predicting such future news events using machine learning and data mining techniques. Our Pundit algorithm generalizes examples of causality pairs to infer a causality predictor. To obtain precisely labeled causality examples, we mine 150 years of news articles and apply semantic natural language modeling techniques to headlines containing certain predefined causality patterns. For generalization, the model uses a vast number of world knowledge ontologies. Empirical evaluation on real news articles shows that our Pundit algorithm performs as well as non-expert humans.
APA, Harvard, Vancouver, ISO, and other styles
29

Fernandez-Escribano, Gerardo, Jens Bialkowski, Jose A. Gamez, Hari Kalva, Pedro Cuenca, Luis Orozco-Barbosa, and AndrÉ Kaup. "Low-Complexity Heterogeneous Video Transcoding Using Data Mining." IEEE Transactions on Multimedia 10, no. 2 (February 2008): 286–99. http://dx.doi.org/10.1109/tmm.2007.911838.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Urmela, S., and M. Nandhini. "A framework for distributed data mining heterogeneous classifier." Computer Communications 147 (November 2019): 58–75. http://dx.doi.org/10.1016/j.comcom.2019.08.010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Wild, David J. "Mining large heterogeneous data sets in drug discovery." Expert Opinion on Drug Discovery 4, no. 10 (August 28, 2009): 995–1004. http://dx.doi.org/10.1517/17460440903233738.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Wu, Qian, Guang-Tai Liang, Qian-Xiang Wang, and Hong Mei. "Mining Effective Temporal Specifications from Heterogeneous API Data." Journal of Computer Science and Technology 26, no. 6 (November 2011): 1061–75. http://dx.doi.org/10.1007/s11390-011-1201-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

M.Karthica and Dr.K. Meenakshi Sundaram. "A Comparative Analysis of Text Mining Techniques and Algorithms." International Journal for Modern Trends in Science and Technology 9, no. 01 (January 25, 2023): 54–61. http://dx.doi.org/10.46501/ijmtst0901010.

Full text
Abstract:
With the abundant technological progression and its colossal consumption develops the gigantic quantity of unstructured text data digitally. This type of data controlluxurious information as well as knowledge. Therefore, in order to extract such an amount of knowledge from unstructured text data, a data expert involve to perform mining techniques over textual data. Text mining is the procedure of extracting hidden, priory unidentified, as well asconsiderablyutilizeful information from unstructured textual data.Web browsers became an significantas well as implement to create the information available at our finger tips. World Wide Web became with information as well as it became tough to regaindata according to the required data. Text mining is a subdivision under web mining. This paper deals with a study of different techniques, pattern of content text mining and the areas which has been influenced by content mining. The web contains efficient, unstructured, partiallyprearranged and multimedia data. This paper focuses on text mining techniques and its algorithmswhich help to retrieve data information in huge data retrieval in content based method.
APA, Harvard, Vancouver, ISO, and other styles
34

Zia, Amjad, Muzzamil Aziz, Ioana Popa, Sabih Ahmed Khan, Amirreza Fazely Hamedani, and Abdul R. Asif. "Artificial Intelligence-Based Medical Data Mining." Journal of Personalized Medicine 12, no. 9 (August 24, 2022): 1359. http://dx.doi.org/10.3390/jpm12091359.

Full text
Abstract:
Understanding published unstructured textual data using traditional text mining approaches and tools is becoming a challenging issue due to the rapid increase in electronic open-source publications. The application of data mining techniques in the medical sciences is an emerging trend; however, traditional text-mining approaches are insufficient to cope with the current upsurge in the volume of published data. Therefore, artificial intelligence-based text mining tools are being developed and used to process large volumes of data and to explore the hidden features and correlations in the data. This review provides a clear-cut and insightful understanding of how artificial intelligence-based data-mining technology is being used to analyze medical data. We also describe a standard process of data mining based on CRISP-DM (Cross-Industry Standard Process for Data Mining) and the most common tools/libraries available for each step of medical data mining.
APA, Harvard, Vancouver, ISO, and other styles
35

Ramesh, Y. V., and S. Shanmukh Rao. "MOOC Data Analytics through Text Mining-An Innovative Approach to Learning Improvement." International Journal for Research in Applied Science and Engineering Technology 11, no. 6 (June 30, 2023): 4802–6. http://dx.doi.org/10.22214/ijraset.2023.54508.

Full text
Abstract:
Abstract: The COVID-19 pandemic has brought about significant changes in the perception of education, with Massive Open Online Course (MOOC) providers like Coursera witnessing a surge in millions of new user registrations on their platforms. However, despite the prevalence of online review systems in various industries, the MOOC ecosystem lacks a standardized or fully decentralized review system. We believe that there is an opportunity to utilize existing open MOOC reviews to create userfriendly and transparent reviewing systems, enabling learners to easily identify the top courses available. By leveraging the wealth of reviews already available in the MOOC ecosystem, we can create simpler and more transparent systems that empower users to make informed choices about the courses they enrol in. In our research, we conduct an analysis of reviews from the Coursera platform with the specific goal of determining the potential value of using NLP-driven sentiment analysis on textual reviews in providing valuable information to learners. By examining the sentiment expressed in the textual reviews, we aim to evaluate whether this approach can offer meaningful insights to learners in assessing the quality of MOOCs. The results of our research suggest that textual reviews may be a more advantageous choice compared to numeric ratings due to the disadvantages associated with numeric ratings, such as the potential for random or arbitrary selections. Our findings indicate that utilizing sentiment analysis on textual reviews could provide valuable information for learners in evaluating the quality of MOOCs. By relying on the rich and descriptive information conveyed through textual reviews, learners may be better equipped to make informed decisions when selecting courses on platforms like Coursera.
APA, Harvard, Vancouver, ISO, and other styles
36

Kobayashi, Vladimer B., Stefan T. Mol, Hannah A. Berkers, Gábor Kismihók, and Deanne N. Den Hartog. "Text Mining in Organizational Research." Organizational Research Methods 21, no. 3 (August 10, 2017): 733–65. http://dx.doi.org/10.1177/1094428117722619.

Full text
Abstract:
Despite the ubiquity of textual data, so far few researchers have applied text mining to answer organizational research questions. Text mining, which essentially entails a quantitative approach to the analysis of (usually) voluminous textual data, helps accelerate knowledge discovery by radically increasing the amount data that can be analyzed. This article aims to acquaint organizational researchers with the fundamental logic underpinning text mining, the analytical stages involved, and contemporary techniques that may be used to achieve different types of objectives. The specific analytical techniques reviewed are (a) dimensionality reduction, (b) distance and similarity computing, (c) clustering, (d) topic modeling, and (e) classification. We describe how text mining may extend contemporary organizational research by allowing the testing of existing or new research questions with data that are likely to be rich, contextualized, and ecologically valid. After an exploration of how evidence for the validity of text mining output may be generated, we conclude the article by illustrating the text mining process in a job analysis setting using a dataset composed of job vacancies.
APA, Harvard, Vancouver, ISO, and other styles
37

Alguliev, Rasim M., Ramiz M. Aliguliyev, and Saadat A. Nazirova. "Classification of Textual E-Mail Spam Using Data Mining Techniques." Applied Computational Intelligence and Soft Computing 2011 (2011): 1–8. http://dx.doi.org/10.1155/2011/416308.

Full text
Abstract:
A new method for clustering of spam messages collected in bases of antispam system is offered. The genetic algorithm is developed for solving clustering problems. The objective function is a maximization of similarity between messages in clusters, which is defined byk-nearest neighbor algorithm. Application of genetic algorithm for solving constrained problems faces the problem of constant support of chromosomes which reduces convergence process. Therefore, for acceleration of convergence of genetic algorithm, a penalty function that prevents occurrence of infeasible chromosomes at ranging of values of function of fitness is used. After classification, knowledge extraction is applied in order to get information about classes. Multidocument summarization method is used to get the information portrait of each cluster of spam messages. Classifying and parametrizing spam templates, it will be also possible to define the thematic dependence from geographical dependence (e.g., what subjects prevail in spam messages sent from certain countries). Thus, the offered system will be capable to reveal purposeful information attacks if those occur. Analyzing origins of the spam messages from collection, it is possible to define and solve the organized social networks of spammers.
APA, Harvard, Vancouver, ISO, and other styles
38

Aleqabie, Hiba, Mais Saad Sfoq, Rand Abdulwahid Albeer, and Enaam Hadi Abd. "A Review Of Text Mining Techniques: Trends, and Applications In Various Domains." Iraqi Journal For Computer Science and Mathematics 5, no. 1 (January 28, 2024): 125–41. http://dx.doi.org/10.52866/ijcsm.2024.05.01.009.

Full text
Abstract:
Text mining, a subfield of natural language processing (NLP), has received considerable attention in recent yearsdue to its ability to extract valuable insights from large volumes of unstructured textual data. This review aims toprovide a comprehensive evaluation of the applicability of text mining techniques across various domains andindustries.The review starts off with a dialogue of the basic ideas and methodologies that are concerned with textual contentmining together with preprocessing, feature extraction, and machine learning algorithms.Furthermore, this survey highlights the challenges faced at some stage in implementing textual content miningstrategies. Additionally, the review explores emerging tendencies and possibilities in text-mining research. Itdiscusses advancements in deep learning models for text evaluation, integration with different AI technologies likeimage or speech recognition for multimodal analysis, utilization of domain-unique ontologies or information graphsfor more desirable information of textual facts, and incorporation of explainable AI strategies to improveinterpretability. The findings from this overview are analyzed to identify common developments and patterns in textmining packages across extraordinary domain names.The consequences of this paper will advantage researchers by means of imparting updated expertise of modernpractices in textual content mining. Additionally, it will manual practitioners in selecting suitable strategies for theirunique application domain names while addressing capacity-demanding situations.
APA, Harvard, Vancouver, ISO, and other styles
39

Mezentseva, Olha O., and Anna S. Kolomiiets. "OPTIMIZATION OF ANALYSIS AND MINIMIZATION OF INFORMATIONLOSSES IN TEXT MINING." Herald of Advanced Information Technology 3, no. 1 (April 10, 2020): 373–82. http://dx.doi.org/10.15276/hait.01.2020.4.

Full text
Abstract:
Information is one of the most important resources of today's business environment. It is difficult for any company to succeed without having sufficient information about its customers, employees and other key stakeholders. Every day, companies receive unstructured and structured text from a variety of sources, such as survey results, tweets, call center notes, phone emails, online customer reviews, recorded interactions, emails and other documents. These sources provide raw text that is difficult to understand without using the right text analysis tool. You can do text analytics manually, but the manual process is inefficient. Traditional systems use keywords and cannot read and understand language in emails, tweets, web pages, and text documents. For this reason, companies use text analysis software to analyze large amounts of text data. The software helps users retrieve textual information to act accordingly The most common manual annotation is currently the most common, which can be attributed to the high quality of annotation and its “meaningfulness”. Typical disadvantages of manual annotation systems, textual information analysis systems are the high material costs and the inherent low speed of work. Therefore, the topic of this article is to explore the methods by which you can effectively annotate reviews of various products from the largest marketplace in Ukraine. The following tasks should be solved: to analyze modern approaches to data analysis and processing; to study basic algorithms for data analysis and processing; build a program that will collect data, design the program architecture for more efficient use, based on the use of the latest technologies; clear data using minimize information loss techniques; analyze the data collected, using data analysis and processing approaches; to draw conclusions from the results of all the above works. There are quite a number of varieties of the listed tasks, as well as methods of solving them. This again confirms the importance and relevance of the topic we choose. The purpose of the study is the methods and means by which information losses can be minimized when analyzing and processing textual data. The object of the study is the process of minimizing information losses in the analysis and processing of textual data. In the course of the study, recent research on the analysis and processing of textual information was analyzed; methods of textual information processing and Data Mining algorithms are analyzed.
APA, Harvard, Vancouver, ISO, and other styles
40

Abid, Amal, Salma Jamoussi, and Abdelmajid Ben Hamadou. "AIS-Clus: A Bio-Inspired Method for Textual Data Stream Clustering." Vietnam Journal of Computer Science 06, no. 02 (May 2019): 223–56. http://dx.doi.org/10.1142/s2196888819500143.

Full text
Abstract:
The spread of real-time applications has led to a huge amount of data shared between users. This vast volume of data rapidly evolving over time is referred to as data stream. Clustering and processing such data poses many challenges to the data mining community. Indeed, traditional data mining techniques become unfeasible to mine such a continuous flow of data where characteristics, features, and concepts are rapidly changing over time. This paper presents a novel method for data stream clustering. In this context, major challenges of data stream processing are addressed, namely, infinite length, concept drift, novelty detection, and feature evolution. To handle these issues, the proposed method uses the Artificial Immune System (AIS) meta-heuristic. The latter has been widely used for data mining tasks and it owns the property of adaptability required by data stream clustering algorithms. Our method, called AIS-Clus, is able to detect novel concepts using the performance of the learning process of the AIS meta-heuristic. Furthermore, AIS-Clus has the ability to adapt its model to handle concept drift and feature evolution for textual data streams. Experimental results have been performed on textual datasets where efficient and promising results are obtained.
APA, Harvard, Vancouver, ISO, and other styles
41

Vasseghian, Yasser, Mohammed Berkani, Fares Almomani, and Elena-Niculina Dragoi. "Data mining for pesticide decontamination using heterogeneous photocatalytic processes." Chemosphere 270 (May 2021): 129449. http://dx.doi.org/10.1016/j.chemosphere.2020.129449.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Chen, R., K. Sivakumar, and H. Kargupta. "Collective Mining of Bayesian Networks from Distributed Heterogeneous Data." Knowledge and Information Systems 6, no. 2 (March 2004): 164–87. http://dx.doi.org/10.1007/s10115-003-0107-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Hassani, Hossein, Christina Beneki, Stephan Unger, Maedeh Taj Mazinani, and Mohammad Reza Yeganegi. "Text Mining in Big Data Analytics." Big Data and Cognitive Computing 4, no. 1 (January 16, 2020): 1. http://dx.doi.org/10.3390/bdcc4010001.

Full text
Abstract:
Text mining in big data analytics is emerging as a powerful tool for harnessing the power of unstructured textual data by analyzing it to extract new knowledge and to identify significant patterns and correlations hidden in the data. This study seeks to determine the state of text mining research by examining the developments within published literature over past years and provide valuable insights for practitioners and researchers on the predominant trends, methods, and applications of text mining research. In accordance with this, more than 200 academic journal articles on the subject are included and discussed in this review; the state-of-the-art text mining approaches and techniques used for analyzing transcripts and speeches, meeting transcripts, and academic journal articles, as well as websites, emails, blogs, and social media platforms, across a broad range of application areas are also investigated. Additionally, the benefits and challenges related to text mining are also briefly outlined.
APA, Harvard, Vancouver, ISO, and other styles
44

Islam, Mohammad Rabiul, Imad Fakhri Al-Shaikhli, Rizal Bin Mohd Nor, and Vijayakumar Varadarajan. "Technical Approach in Text Mining for Stock Market Prediction: A Systematic Review." Indonesian Journal of Electrical Engineering and Computer Science 10, no. 2 (May 1, 2018): 770. http://dx.doi.org/10.11591/ijeecs.v10.i2.pp770-777.

Full text
Abstract:
Text mining methods and techniques have disclosed the mining task throughout information retrieval discipline in the field of soft computing techniques. To find the meaningful information from the vast amount of electronic textual data become a humongous task for trading decision. This empirical research of text mining role on financial text analysing in where stock predictive model need to improve based on rank search method. The review of this paper basically focused on text mining techniques, methods and principle component analysis that help reduce the dimensionality within the characteristics and optimal features. Moreover, most sophisticated soft-computing methods and techniques are reviewed in terms of analysis, comparison and evaluation for its performance based on electronic textual data. Due to research significance, this empirical research also highlights the limitation of different strategies and methods on exact aspects of theoretical framework for enhancing of performance.
APA, Harvard, Vancouver, ISO, and other styles
45

Shyam Mohan, J. S., P. Shanmugapriya, and Bhamidipati Vinay Pawan Kumar. "The Big Data Mining Approach for Finding top rated URL." Journal of Applied Computer Science Methods 7, no. 1 (February 1, 2015): 17–32. http://dx.doi.org/10.1515/jacsm-2015-0007.

Full text
Abstract:
Abstract Finding out the widely used URL’s from online shopping sites for any particular category is a difficult task as there are many heterogeneous and multi-dimensional data set which depends on various factors. Traditional data mining methods are limited to homogenous data source, so they fail to sufficiently consider the characteristics of heterogeneous data. This paper presents a consistent Big Data mining search which performs analytics on text data to find the top rated URL’s. Though many heuristic search methods are available, our proposed method solves the problem of searching compared with traditional methods in data mining. The sample results are obtained in optimal time and are compared with other methods which is effective and efficient.
APA, Harvard, Vancouver, ISO, and other styles
46

Pati, A., Y. Jin, K. Klage, R. F. Helm, L. S. Heath, and N. Ramakrishnan. "CMGSDB: integrating heterogeneous Caenorhabditis elegans data sources using compositional data mining." Nucleic Acids Research 36, Database (December 23, 2007): D69—D76. http://dx.doi.org/10.1093/nar/gkm804.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Lee, Jae-Seong, and Seung-Pyo Jun. "Privacy-preserving data mining for open government data from heterogeneous sources." Government Information Quarterly 38, no. 1 (January 2021): 101544. http://dx.doi.org/10.1016/j.giq.2020.101544.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Критська, Я. О., T. O. Білобородова, and І. С. Скарга-Бандурова. "Data mining techniques for IoT analytics." ВІСНИК СХІДНОУКРАЇНСЬКОГО НАЦІОНАЛЬНОГО УНІВЕРСИТЕТУ імені Володимира Даля, no. 5(253) (September 5, 2019): 53–62. http://dx.doi.org/10.33216/1998-7927-2019-253-5-53-62.

Full text
Abstract:
Data mining (DM) is one of the most valuable technologies enable to identify unknown patterns and make Internet of Things (IoT) smarter. The current survey focuses on IoT data and knowledge discovery processes for IoT. In this paper, we present a systematic review of various DM models and discuss the DM techniques applicable to different IoT data. Some data specific features were analyzed, and algorithms for knowledge discovery in IoT data were considered.Challenges and opportunities for mining multimodal, heterogeneous, noisy, incomplete, unbalanced and biased data as well as massive datasets in IoT are also discussed.
APA, Harvard, Vancouver, ISO, and other styles
49

Mishra, Kinnari, and Mansi Vegad. "Customer Feedback Analysis Using Text Mining." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 10, no. 2 (April 21, 2024): 636–41. http://dx.doi.org/10.32628/cseit2410238.

Full text
Abstract:
Complexity surrounding the holistic nature of customer experience has made measuring customer perceptions of interactive service experiences challenging. At the same time, advances in technology and changes in methods for collecting explicit customer feedback are generating increasing volumes of unstructured textual data, making it difficult for managers to analyze and interpret this information. Consequently, text mining, a method enabling automatic extraction of information from textual data, is gaining in popularity. However, this method has performed below expectations in terms of depth of analysis of customer experience feedback and accuracy. In this study, we advance linguistics-based text mining modeling to inform the process of developing an improved framework. The proposed framework incorporates important elements of customer experience, service methodologies and theories such as co-creation processes, interactions and context. This more holistic approach for analyzing feedback facilitates a deeper analysis of customer feedback experiences, by encompassing three value creation elements: activities, resources, and context (ARC). Empirical results show that the ARC framework facilitates the development of a text mining model for analysis of customer textual feedback that enables companies to assess the impact of interactive service processes on customer experiences. The proposed text mining model shows high accuracy levels and provides flexibility through training. As such, it can evolve to account for changing contexts over time and be deployed across different (service) business domains; we term it an “open learning” model. The ability to timely assess customer experience feedback represents a pre-requisite for successful co-creation processes in a service environment.
APA, Harvard, Vancouver, ISO, and other styles
50

Ekerete, Idongesit, Matias Garcia-Constantino, Christopher Nugent, Paul McCullagh, and James McLaughlin. "Data Mining and Fusion Framework for In-Home Monitoring Applications." Sensors 23, no. 21 (October 24, 2023): 8661. http://dx.doi.org/10.3390/s23218661.

Full text
Abstract:
Sensor Data Fusion (SDT) algorithms and models have been widely used in diverse applications. One of the main challenges of SDT includes how to deal with heterogeneous and complex datasets with different formats. The present work utilised both homogenous and heterogeneous datasets to propose a novel SDT framework. It compares data mining-based fusion software packages such as RapidMiner Studio, Anaconda, Weka, and Orange, and proposes a data fusion framework suitable for in-home applications. A total of 574 privacy-friendly (binary) images and 1722 datasets gleaned from thermal and Radar sensing solutions, respectively, were fused using the software packages on instances of homogeneous and heterogeneous data aggregation. Experimental results indicated that the proposed fusion framework achieved an average Classification Accuracy of 84.7% and 95.7% on homogeneous and heterogeneous datasets, respectively, with the help of data mining and machine learning models such as Naïve Bayes, Decision Tree, Neural Network, Random Forest, Stochastic Gradient Descent, Support Vector Machine, and CN2 Induction. Further evaluation of the Sensor Data Fusion framework based on cross-validation of features indicated average values of 94.4% for Classification Accuracy, 95.7% for Precision, and 96.4% for Recall. The novelty of the proposed framework includes cost and timesaving advantages for data labelling and preparation, and feature extraction.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography