To see the other types of publications on this topic, follow the link: Unstructured data mining.

Journal articles on the topic 'Unstructured data mining'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Unstructured data mining.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Rajalakshmi, Thiruthuraipondi Natarajan. "Data Mining from Unstructured Documents." International Journal on Science and Technology 14, no. 3 (2023): 1–6. https://doi.org/10.5281/zenodo.14631493.

Full text
Abstract:
Data Mining is the process of identifying and extracting valuable data by scanning through large volumes of structured and unstructured data, which would form the base for further processing using data analytics tools for cleansing, categorization and organization, etc. This source data might not fit to a certain template and can be of any format ranging from plan test to media files and it is the responsibility of the mining process to understand the message, extract relevant information and finally convert to a standard format. Prior to its final stage, these data undergo several rounds to cleansing to eliminate irrelevant information and pick the right set of data intended by the organization with the best turnaround time possible. At each stage of the analysis, the data needs to gets cleaner and distinctive and provide a vision as to the areas it will be used.This document provides insight on data mining and its potential impact in market. This explores the various sources and the type of data that might be associated with it and how to cleanse and various ways the information can be used for the development of a retail business. This also provides guidance on the patten recognition and the proper compartmentalization of the data so that it is readily available to the target groups for research and marketing
APA, Harvard, Vancouver, ISO, and other styles
2

OLIANIN, Denys, and Halyna TSYPRYK. "OVERVIEW OF TRANSFORMERS ROLE IN DATA MINING FROM UNSTRUCTURED DATA." MEASURING AND COMPUTING DEVICES IN TECHNOLOGICAL PROCESSES, no. 2 (May 21, 2025): 360–64. https://doi.org/10.31891/2219-9365-2025-82-52.

Full text
Abstract:
The rapid growth of Big Data has made it increasingly important to extract meaningful insights from unstructured sources such as text, audio, video, and emails. Traditional data mining techniques—like tokenization, clustering, classification, and association rule mining—have provided a basis for processing these complex data forms. However, they often struggle to capture the subtle semantic and contextual relationships that are inherent in unstructured data. In this article, we examine the limitations of these conventional methods and explore the impact of Transformer Neural Networks (TNNs) on unstructured data mining. Transformer architectures have revolutionized the field by employing self-attention mechanisms and positional encodings, which allow for parallel processing of data. This new approach enables the creation of high-quality embeddings that capture both semantic and syntactic information. As a result, tasks such as sentiment analysis, topic modeling, and automated summarization are significantly enhanced. Additionally, integrating transformers into audio signal processing and email mining has led to notable improvements in automatic speech recognition and semantic analysis, effectively addressing some of the long-standing challenges in these areas. The findings discussed in this article highlight the potential of transformer-based approaches to not only overcome the limitations of traditional data mining methods but also to open the door to innovative applications across various fields. Future research directions include developing more computationally efficient transformer models and exploring hybrid approaches that combine traditional techniques with advanced neural architectures. These efforts will ultimately push the boundaries of what is possible in unstructured data mining.
APA, Harvard, Vancouver, ISO, and other styles
3

Muhammad, Aoun. "Comparative Analysis of Text Mining Techniques for News Article Summarization." LC International Journal of STEM (ISSN: 2708-7123) 4, no. 1 (2023): 52–63. https://doi.org/10.5281/zenodo.7893329.

Full text
Abstract:
Text mining research paper is a scientific study that focuses on the development and application of text mining techniques for extracting valuable information from unstructured textual data. The paper discusses the challenges of working with unstructured data and the need for advanced text mining techniques to address these challenges. The paper outlines the various steps involved in the text mining process, such as data preprocessing, text representation, and feature selection. It discusses the importance of selecting appropriate algorithms for different types of text mining tasks, including text classification, clustering, sentiment analysis, and topic modeling. The paper also discusses the challenges of evaluating text mining models, including issues related to data quality, model performance, and interpretability. It highlights the importance of using appropriate evaluation metrics and techniques to ensure the reliability and validity of the results. Finally, the paper provides case studies and real-world examples of text mining applications in various domains such as healthcare, social media analysis, and financial analysis. It emphasizes the potential of text mining to provide valuable insights and knowledge that can be used to support decision-making in different industries. Overall, the paper highlights the importance of text mining as a powerful tool for analyzing unstructured textual data and provides a comprehensive overview of the key techniques and challenges in this field.
APA, Harvard, Vancouver, ISO, and other styles
4

Anisha, S., and S. Thiyagarajan Dr. "Analytical Study on Unstructured Data Management in Application Data Base through NLP and Datamining." Analytical Study on Unstructured Data Management in Application Data Base through NLP and Datamining 9, no. 1 (2024): 5. https://doi.org/10.5281/zenodo.10634318.

Full text
Abstract:
Business Organizations are flooded with large pool of unstructured data. Loading these data into business database warranted a lot of processes. Companies having BPO and KPO are working for converting unstructured data into their software database with huge resources through programming, with multiple queries and users. To deal with such complex and perplexed situations need an automated system in place and thereby saving a large amount of time and resources. The aim of the present research was to analyse methodically, the technical works relating to the application of data mining, artificial intelligence (AI) and machine learning (ML) in the software industry. In this paper combining with different disciplines of data mining techniques, ML and NLP. Objective of this paper is to improve the organization's business intelligence process through maximum exploitation of unstructured data owned by them. This paper primarily attempts to examine the applicability of combination of data mining techniques, NLP and ML in handling unstructured data and reduces the burden on users by minimizing the usage of multiple queries and make them user-friendly to extract data from large database. Keywords:- Application Database, Data mining, ML, NLP.
APA, Harvard, Vancouver, ISO, and other styles
5

Lomotey, Richard K., and Ralph Deters. "Unstructured data mining: use case for CouchDB." International Journal of Big Data Intelligence 2, no. 3 (2015): 168. http://dx.doi.org/10.1504/ijbdi.2015.070597.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Reshmy, A. K., and D. Paulraj. "Data mining of unstructured big data in cloud computing." International Journal of Business Intelligence and Data Mining 12, no. 3/4 (2017): 1. http://dx.doi.org/10.1504/ijbidm.2017.10004683.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Reshmy, A. K., and D. Paulraj. "Data mining of unstructured big data in cloud computing." International Journal of Business Intelligence and Data Mining 13, no. 1/2/3 (2018): 147. http://dx.doi.org/10.1504/ijbidm.2018.088430.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Oh, Tae-Jin, and Anthony. "New and Fast Emerging Advance Structure of Text Mining from Unstructured Data." Bonfring International Journal of Industrial Engineering and Management Science 7, no. 2 (2017): 13–16. http://dx.doi.org/10.9756/bijiems.8325.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Singh, Shashi Pal, Ajai Kumar, Rachna Awasthi, Neetu Yadav, and Shikha Jain. "Intelligent Bilingual Data Extraction and Rebuilding Using Data Mining for Big Data." Journal of Computational and Theoretical Nanoscience 17, no. 1 (2020): 513–18. http://dx.doi.org/10.1166/jctn.2020.8699.

Full text
Abstract:
In today’s World there exists various source of data in various formats (file formats), different structure, different types and etc. which is a hug collection of unstructured over the internet or social media. This gives rise to categorization of data as unstructured, semi structured and structured data. Data that exist in irregular manner without any particular schema are referred as unstructured data which is very difficult to process as it consists of irregularities and ambiguities. So, we are focused on Intelligent Processing Unit which converts unstructured big data into intelligent meaningful information. Intelligent text extraction is a technique that automatically identifies and extracts text from file format. The system consists of different stages which include the pre-processing, keyphase extraction techniques and transformation for the text extraction and retrieve structured data from unstructured data. The system consists multiple method/approach give better result. We are currently working in various file formats and converting the file format into DOCX which will come in the form of the un-structure Form, and then we will obtain that file in the structure form with the help of intelligent Pre-processing. The pre-process stages that triggers the unstructured data/corpus into structured data converting into meaning full. The Initial stage is the system remove the stop word, unwanted symbols noisy data and line spacing. The second stage is Data Extraction from various sources of file or types of files into proper format plain text. The then in third stage we transform the data or information from one format to another for the user to understand the data. The final step is rebuilding the file in its original format maintaining tag of the files. The large size files are divided into sub small size file to executed the parallel processing algorithms for fast processing of larger files and data. Parallel processing is a very important concept for text extraction and with its help; the big file breaks in a small file and improves the result. Extraction of data is done in Bilingual language, and represent the most relevant information contained in the document. Key-phase extraction is an important problem of data mining, Knowledge retrieval and natural speech processing. Keyword Extraction technique has been used to abstract keywords that exclusively recognize a document. Rebuilding is an important part of this project and we will use the entire concept in that file format and in the last, we need the same format which we have done in that file. This concept is being widely used but not much work of the work has been done in the area of developing many functionalities under one tool, so this makes us feel the requirement of such a tool which can easily and efficiently convert unstructured files into structured one.
APA, Harvard, Vancouver, ISO, and other styles
10

Suneeta, Salimath. "Need of Data Mining in Search Engine Optimization." Journal of Management Commerce Engineering and IT (JMCEI) 1, no. 2 (2022): 9–12. https://doi.org/10.5281/zenodo.7265250.

Full text
Abstract:
Analyzing massive data sets to discover novel traffic patterns and uncover market possibilities may be summed up as data mining SEO activity. These specialty trends are then used to target a certain user group more effectively with a service or product. Companies employ data mining as a method to transform unstructured data into information that is valuable. Businesses may learn more about their consumers to create more successful marketing campaigns, boost sales, and cut expenses by employing software to seek for patterns in massive volumes of data. Organizations use data mining to find patterns in data that might provide insights into their operational needs. Both business intelligence and data science require it. Organizations may utilise a variety of data mining approaches to transform unstructured data into insights that can be put to use.
APA, Harvard, Vancouver, ISO, and other styles
11

Ali, Hameed Yassir, A. Mohammed Ali, Abdul-Jabbar Alkhazraji Adel, Emad Hameed Mustafa, Saad Talib Mohammed, and Faeq Ali Mohanad. "Sentimental classification analysis of polarity multi-view textual data using data mining techniques." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 5 (2020): 5526–34. https://doi.org/10.11591/ijece.v10i5.pp5526-5534.

Full text
Abstract:
The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics.
APA, Harvard, Vancouver, ISO, and other styles
12

Ahmed, Adeeb Jalal, Ahmed Jasim Abdulrahman, and A. Mahawish Amar. "A web content mining application for detecting relevant pages using Jaccard similarity." International Journal of Electrical and Computer Engineering (IJECE) 12, no. 6 (2022): 6461–71. https://doi.org/10.11591/ijece.v12i6.pp6461-6471.

Full text
Abstract:
The tremendous growth in the availability of enormous text data from a variety of sources creates a slew of concerns and obstacles to discovering meaningful information. This advancement of technology in the digital realm has resulted in the dispersion of texts over millions of web sites. Unstructured texts are densely packed with textual information. The discovery of valuable and intriguing relationships in unstructured texts demands more computer processing. So, text mining has developed into an attractive area of study for obtaining organized and useful data. One of the purposes of this research is to discuss text pre-processing of automobile marketing domains in order to create a structured database. Regular expressions were used to extract data from unstructured vehicle advertisements, resulting in a well-organized database. We manually develop unique rule-based ways of extracting structured data from unstructured web pages. As a result of the information retrieved from these advertisements, a systematic search for certain noteworthy qualities is performed. There are numerous approaches for query recommendation, and it is vital to understand which one should be employed. Additionally, this research attempts to determine the optimal value similarity for query suggestions based on user-supplied parameters by comparing MySQL pattern matching and Jaccard similarity.
APA, Harvard, Vancouver, ISO, and other styles
13

Lomotey, Richard K., and Ralph Deters. "RSenter: terms mining tool from unstructured data sources." International Journal of Business Process Integration and Management 6, no. 4 (2013): 298. http://dx.doi.org/10.1504/ijbpim.2013.059136.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Li, Zhihua, Xinye Yu, Tao Wei, and Junhao Qian. "Unstructured Big Data Threat Intelligence Parallel Mining Algorithm." Big Data Mining and Analytics 7, no. 2 (2024): 531–46. http://dx.doi.org/10.26599/bdma.2023.9020032.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

B.Prasanalakshmi and A.Selvaraj. "A Survey on Accessing Data over Cloud Environment using Data mining Algorithms." Indian Journal of Emerging Electronics in Computer Communications 2, no. 2 (2015): 498–505. https://doi.org/10.5281/zenodo.33111.

Full text
Abstract:
In today’s world to access the large set of data is more complex, because the data may be structured and unstructured like in the form of text, images, videos, etc., it cannot be controlled from the internet users this is known as Big data. Useful data can be accessed through extracting from big data with the help of data mining algorithms. Data mining is a technique for determine the patterns; classify the data, clustering from the large set of data. In this paper we will discuss how large set of data can be access through data mining algorithms over cloud environment.
APA, Harvard, Vancouver, ISO, and other styles
16

M.Karthica and Dr.K. Meenakshi Sundaram. "A Comparative Analysis of Text Mining Techniques and Algorithms." International Journal for Modern Trends in Science and Technology 9, no. 01 (2023): 54–61. http://dx.doi.org/10.46501/ijmtst0901010.

Full text
Abstract:
With the abundant technological progression and its colossal consumption develops the gigantic quantity of unstructured text data digitally. This type of data controlluxurious information as well as knowledge. Therefore, in order to extract such an amount of knowledge from unstructured text data, a data expert involve to perform mining techniques over textual data. Text mining is the procedure of extracting hidden, priory unidentified, as well asconsiderablyutilizeful information from unstructured textual data.Web browsers became an significantas well as implement to create the information available at our finger tips. World Wide Web became with information as well as it became tough to regaindata according to the required data. Text mining is a subdivision under web mining. This paper deals with a study of different techniques, pattern of content text mining and the areas which has been influenced by content mining. The web contains efficient, unstructured, partiallyprearranged and multimedia data. This paper focuses on text mining techniques and its algorithmswhich help to retrieve data information in huge data retrieval in content based method.
APA, Harvard, Vancouver, ISO, and other styles
17

A R, Anusha. "Novel Approach to Transform Unstructured Healthcare Data to Structured Data." International Journal for Research in Applied Science and Engineering Technology 9, no. VII (2021): 2798–802. http://dx.doi.org/10.22214/ijraset.2021.36972.

Full text
Abstract:
With the rapid growth in number and dimension of databases and database applications in Healthcare records, it is necessary to design a system to achieve automatic extraction of facts from huge table. At the same point, there is a provocation in controlling unstructured data as it highly difficult to analyze and extract actionable intelligence. Preprocessing is an important task and critical step in Text Mining, Regular Expression and Information retrieval. The accession of key data from unstructured data is often difficult. The objective of this project is to transform the unstructured healthcare data to structured data particularly to gain perception and to generate appropriate structured data.
APA, Harvard, Vancouver, ISO, and other styles
18

Davahli, Mohammad Reza, Waldemar Karwowski, Edgar Gutierrez, et al. "Identification and Prediction of Human Behavior through Mining of Unstructured Textual Data." Symmetry 12, no. 11 (2020): 1902. http://dx.doi.org/10.3390/sym12111902.

Full text
Abstract:
The identification of human behavior can provide useful information across multiple job spectra. Recent advances in applying data-based approaches to social sciences have increased the feasibility of modeling human behavior. In particular, studying human behavior by analyzing unstructured textual data has recently received considerable attention because of the abundance of textual data. The main objective of the present study was to discuss the primary methods for identifying and predicting human behavior through the mining of unstructured textual data. Of the 823 articles analyzed, 87 met the predefined inclusion criteria and were included in the literature review. Our results show that the included articles could be symmetrically classified into two groups. The first group of articles attempted to identify the leading indicators of human behavior in unstructured textual data. In this group, the data-based approaches had three main components: (1) collecting self-reported survey data, (2) collecting data from social media and extracting data features, and (3) applying correlation analysis to evaluate the relationship between two sets of data. In contrast, the second group focused on the accuracy of data-based approaches for predicting human behavior. In this group, the data-based approaches could be categorized into (1) approaches based on labeled unstructured textual data and (2) approaches based on unlabeled unstructured textual data. The review provides a comprehensive insight into unstructured textual data mining to identify and predict human behavior and personality traits.
APA, Harvard, Vancouver, ISO, and other styles
19

Kim, Geun-hyung, Seongmo Yang, Jihoon Kang, Jin-eun Jeong, and Seung Hwan Park. "Analysis of Weapon System Unstructured Data Using Text Mining." Journal of Applied Reliability 20, no. 4 (2020): 349–56. http://dx.doi.org/10.33162/jar.2020.12.20.4.349.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Krzywicki, Alfred, Wayne Wobcke, Michael Bain, John Calvo Martinez, and Paul Compton. "Data mining for building knowledge bases: techniques, architectures and applications." Knowledge Engineering Review 31, no. 2 (2016): 97–123. http://dx.doi.org/10.1017/s0269888916000047.

Full text
Abstract:
AbstractData mining techniques for extracting knowledge from text have been applied extensively to applications including question answering, document summarisation, event extraction and trend monitoring. However, current methods have mainly been tested on small-scale customised data sets for specific purposes. The availability of large volumes of data and high-velocity data streams (such as social media feeds) motivates the need to automatically extract knowledge from such data sources and to generalise existing approaches to more practical applications. Recently, several architectures have been proposed for what we callknowledge mining: integrating data mining for knowledge extraction from unstructured text (possibly making use of a knowledge base), and at the same time, consistently incorporating this new information into the knowledge base. After describing a number of existing knowledge mining systems, we review the state-of-the-art literature on both current text mining methods (emphasising stream mining) and techniques for the construction and maintenance of knowledge bases. In particular, we focus on mining entities and relations from unstructured text data sources, entity disambiguation, entity linking and question answering. We conclude by highlighting general trends in knowledge mining research and identifying problems that require further research to enable more extensive use of knowledge bases.
APA, Harvard, Vancouver, ISO, and other styles
21

Kumar Reddy, S. Dinesh. "Web Mining to Detect Online Spread of Terrorism." International Scientific Journal of Engineering and Management 04, no. 05 (2025): 1–7. https://doi.org/10.55041/isjem03478.

Full text
Abstract:
Abstract: In the recent times, terrorism has grown in an exponential manner in certain parts of the world. This enormous growth in terrorist activities has made it important to stop terrorism and prevent its spread before it causes damage to human life or property. With development in technology, internet has become a medium of spreading terrorism through speeches and videos. Terrorist organizations use the medium of the internet to harm and defame individuals and also promote terrorist activities through web pages that force people to join terrorist organizations and commit crimes on the behalf of those organizations. Web mining and data mining are used simultaneously for the purpose of efficient system development. Web mining even consists of many different text mining methods that can be helpful to scan and extract relevant data from unstructured data. Text mining is very helpful in detecting various patterns, keywords, and significant information in unstructured texts. Data mining and web mining systems are used for mining from text widely. Data mining algorithms are used to manage organized data sets and web mining algorithms can be helpful in mining and extracting from unstructured web pages and text data that is available across the web. Websites built in different platforms have varying data structures and that makes it quite difficult to read for a single algorithm. Keywords: Terrorism, naïve-bayes, random forest, online spread
APA, Harvard, Vancouver, ISO, and other styles
22

Wang, Jiangping. "Extracting Value from Unstructured Data – Implementing Text Analytics on the Voice of Student." Transactions on Machine Learning and Artificial Intelligence 8, no. 4 (2020): 14–22. http://dx.doi.org/10.14738/tmlai.84.8456.

Full text
Abstract:
Unstructured data is chaotic and messy with little or no metadata and lacks of traditional organization structure. However, same as any structured data, unstructured data is also part of valuable business asset. Many times, it is text heavy and needs extensive preprocessing before data mining algorithm can apply for building models in order to reveal value hidden in the data. Text as a form of data is widely used in business operations as a major way of communication, generating increasing volumes of data. Text data in its raw form is relatively dirty. The embedded business value can be extracted through approaches in text mining and text analytics. This paper presents a case study in this general process of revealing value in unstructured data and applying on data collected to support online learning and student assistance.
APA, Harvard, Vancouver, ISO, and other styles
23

Jiang, Kai, Like Liu, Rong Xiao, and Nenghai Yu. "Mining Local Specialties for Travelers by Leveraging Structured and Unstructured Data." Advances in Multimedia 2012 (2012): 1–9. http://dx.doi.org/10.1155/2012/987124.

Full text
Abstract:
Recently, many local review websites such as Yelp are emerging, which have greatly facilitated people's daily life such as cuisine hunting. However they failed to meet travelers' demands because travelers are more concerned about a city's local specialties instead of the city's high ranked restaurants. To solve this problem, this paper presents a local specialty mining algorithm, which utilizes both the structured data from local review websites and the unstructured user-generated content (UGC) from community Q&A websites, and travelogues. The proposed algorithm extracts dish names from local review data to build a document for each city, and appliestfidfweighting algorithm on these documents to rank dishes. Dish-city correlations are calculated from unstructured UGC, and combined with thetfidfranking score to discover local specialties. Finally, duplicates in the local specialty mining results are merged. A recommendation service is built to present local specialties to travelers, along with specialties' associated restaurants, Q&A threads, and travelogues. Experiments on a large data set show that the proposed algorithm can achieve a good performance, and compared to using local review data alone, leveraging unstructured UGC can boost the mining performance a lot, especially in large cities.
APA, Harvard, Vancouver, ISO, and other styles
24

Shastri, Shankarayya, Veeragangadhara Swamy Teligi Math, and Patil Nagaraja Siddalingappa. "Sensing complicated meanings from unstructured data: a novel hybrid approach." International Journal of Electrical and Computer Engineering (IJECE) 14, no. 1 (2024): 711. http://dx.doi.org/10.11591/ijece.v14i1.pp711-720.

Full text
Abstract:
The majority of data on computers nowadays is in the form of unstructured data and unstructured text. The inherent ambiguity of natural language makes it incredibly difficult but also highly profitable to find hidden information or comprehend complex semantics in unstructured text. In this paper, we present the combination of natural language processing (NLP) and convolution neural network (CNN) hybrid architecture called automated analysis of unstructured text using machine learning (AAUT-ML) for the detection of complex semantics from unstructured data that enables different users to make understand formal semantic knowledge to be extracted from an unstructured text corpus. The AAUT-ML has been evaluated using three datasets data mining (DM), operating system (OS), and data base (DB), and compared with the existing models, i.e., YAKE, term frequency-inverse document frequency (TF-IDF) and text-R. The results show better outcomes in terms of precision, recall, and macro-averaged F1-score. This work presents a novel method for identifying complex semantics using unstructured data.
APA, Harvard, Vancouver, ISO, and other styles
25

Shastri, Shankarayya, Teligi Math Veeragangadhara Swamy, and Siddalingappa Patil Nagaraja. "Sensing complicated meanings from unstructured data: a novel hybrid approach." International Journal of Electrical and Computer Engineering (IJECE) 14, no. 1 (2024): 711–20. https://doi.org/10.11591/ijece.v14i1.pp711-720.

Full text
Abstract:
The majority of data on computers nowadays is in the form of unstructured data and unstructured text. The inherent ambiguity of natural language makes it incredibly difficult but also highly profitable to find hidden information or comprehend complex semantics in unstructured text. In this paper, we present the combination of natural language processing (NLP) and convolution neural network (CNN) hybrid architecture called automated analysis of unstructured text using machine learning (AAUT-ML) for the detection of complex semantics from unstructured data that enables different users to make understand formal semantic knowledge to be extracted from an unstructured text corpus. The AAUT-ML has been evaluated using three datasets data mining (DM), operating system (OS), and data base (DB), and compared with the existing models, i.e., YAKE, term frequency-inverse document frequency (TF-IDF) and text-R. The results show better outcomes in terms of precision, recall, and macro-averaged F1-score. This work presents a novel method for identifying complex semantics using unstructured data.
APA, Harvard, Vancouver, ISO, and other styles
26

Bhargavi Konda. "The impact of data preprocessing on data mining outcomes." World Journal of Advanced Research and Reviews 15, no. 3 (2022): 540–44. https://doi.org/10.30574/wjarr.2022.15.3.0931.

Full text
Abstract:
Data preprocessing is a vital initial step during knowledge discovery because it determines the success of data mining projects. A dataset's quality and representation stand as the primary element because any presence of redundant, irrelevant, too noisy, or unreliable information will severely disrupt the knowledge discovery process. The preprocessing phase first converts unstructured data into an analytical format alongside solutions for data inconsistencies, errors, and missing values to maintain data mining result integrity. The preprocessing corrects data quality problems and arranges data properly, improving data mining model accuracy, efficiency, and interpretability. The data mining pipeline requires data preprocessing as its essential foundation since it provides multiple techniques to convert raw data into an effective analytical format. Data mining depends heavily on preprocessing operations because they guarantee proper analysis results through accurate correction of errors and optimal data structure development and absent data point management.
APA, Harvard, Vancouver, ISO, and other styles
27

Fernando, S.G.S, Md GaparMdJohar, and S.N. Perera. "Empirical Analysis of Data Mining Techniques for Social Network Websites." COMPUSOFT: An International Journal of Advanced Computer Technology 03, no. 02 (2014): 582–92. https://doi.org/10.5281/zenodo.14640033.

Full text
Abstract:
Social networks allow users to collaborate with others. People of similar backgrounds and interests meet and cooperate using these social networks, enabling them to share information across the world. The social networks contain millions of unprocessed raw data. By analyzing this data new knowledge can be gained. Since this data is dynamic and unstructured traditional data mining techniques will not be appropriate. Web data mining is an interesting field with vast amount of applications. With the growth of online social networks have significantly increased data content available because profile holders become more active producers and distributors of such data. This paper identifies and analyzes existing web mining techniques used to mine social network data. 
APA, Harvard, Vancouver, ISO, and other styles
28

AL-Mashhadany, Abeer K., Dalal N. Hamood, Ahmed T. Sadiq Al-Obaidi, and Waleed K. Al-Mashhadany. "Extracting numerical data from unstructured Arabic texts(ENAT)." Indonesian Journal of Electrical Engineering and Computer Science 21, no. 3 (2021): 1759–70. https://doi.org/10.11591/ijeecs.v21.i3.pp1759-1770.

Full text
Abstract:
Unstructured data becomes challenges because in recent years have observed the ability to gather a massive amount of data from annotated documents. This paper interested with Arabic unstructured text analysis. Manipulating unstructured text and converting it into a form understandable by computer is a high-level aim. An important step to achieve this aim is to understand numerical phrases. This paper aims to extract numerical data from Arabic unstructured text in general. This work attempts to recognize numerical characters phrases, analyze them and then convert them into integer values. The inference engine is based on the Arabic linguistic and morphological rules. The applied method encompasses rules of numerical nouns with Arabic morphological rules, in order to achieve high accurate extraction method. Arithmetic operations are applied to convert the numerical phrase into integer value. The proper operation is determined depending on linguistic and morphological rules. It will be shown that applying Arabic linguistic rules together with arithmetic operations succeeded in extracting numerical data from Arabic unstructured text with high accuracy reaches to 100%.
APA, Harvard, Vancouver, ISO, and other styles
29

Devendra, Kumar Mishra*1. "CHALLENGES IN TEXT MINING FOR BUSINESS INTELLIGENCE." International Journal of Engineering Technologies and Management Research 5, no. 2 (SE) (2018): 301–4. https://doi.org/10.5281/zenodo.1247479.

Full text
Abstract:
Today is the era of internet; the internet represents a big space where large amounts of data are added every day. This huge amount of digital data and interconnection exploding data. Big Data mining have the capability to retrieving useful information in large datasets or streams of data. Analysis can also be done in a distributed environment. The framework needed for analysis to this large amount of data must support statistical analysis and data mining. The framework should be design in such a way so that big data and traditional data can be combined, so results that come analyzing new data with the old data. Traditional tools are not sufficient to extract information those are unseen.
APA, Harvard, Vancouver, ISO, and other styles
30

Azeroual, Otmane. "A Text and Data Analytics Approach to Enrich the Quality of Unstructured Research Information." Computer and Information Science 12, no. 4 (2019): 84. http://dx.doi.org/10.5539/cis.v12n4p84.

Full text
Abstract:
With the increased accessibility of research information, the demands on research information systems (RIS) that are expected to automatically generate and process knowledge are increasing. Furthermore, the quality of the RIS data entries of the individual sources of information causes problems. If the data is structured in RIS, users can read and filter out their information and knowledge needs without any problems. This technique, which nevertheless allows text databases and text sources to be analyzed and knowledge extracted from unknown texts, is referred to as text mining or text data mining based on the principles of data mining. Text mining allows automatically classifying large heterogeneous sources of research information and assigning them to specific topics. Research information has always played a major role in higher education and academic institutions, although they were usually available in unstructured form in RIS and grow faster than structured data. This can be a waste of time searching for RIS staff in universities and can lead to bad decision-making. For this reason, the present paper proposes a new approach to obtaining structured research information from heterogeneous information systems. It is a subset of an approach to the semantic integration of unstructured data using the example of a RIS. The purpose of this paper is to investigate text and data mining methods in the context of RIS and to develop an improvement quality model as an aid to RIS using universities and academic institutions to enrich unstructured research information.
APA, Harvard, Vancouver, ISO, and other styles
31

Zia, Amjad, Muzzamil Aziz, Ioana Popa, Sabih Ahmed Khan, Amirreza Fazely Hamedani, and Abdul R. Asif. "Artificial Intelligence-Based Medical Data Mining." Journal of Personalized Medicine 12, no. 9 (2022): 1359. http://dx.doi.org/10.3390/jpm12091359.

Full text
Abstract:
Understanding published unstructured textual data using traditional text mining approaches and tools is becoming a challenging issue due to the rapid increase in electronic open-source publications. The application of data mining techniques in the medical sciences is an emerging trend; however, traditional text-mining approaches are insufficient to cope with the current upsurge in the volume of published data. Therefore, artificial intelligence-based text mining tools are being developed and used to process large volumes of data and to explore the hidden features and correlations in the data. This review provides a clear-cut and insightful understanding of how artificial intelligence-based data-mining technology is being used to analyze medical data. We also describe a standard process of data mining based on CRISP-DM (Cross-Industry Standard Process for Data Mining) and the most common tools/libraries available for each step of medical data mining.
APA, Harvard, Vancouver, ISO, and other styles
32

Voruganti, Santhosh. "Survey on Data-intensive Applications, Tools and Techniques for Mining Unstructured Data." International Journal of Computer Applications 146, no. 12 (2016): 23–27. http://dx.doi.org/10.5120/ijca2016910946.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

O.U.Askaraliev. "Using data mining technologies to optimize queries in an unstructured data warehouse." «Muhandislik va Iqtisodiyot» jurnali 3, no. 1 (2025): 10–18. https://doi.org/10.5281/zenodo.14924973.

Full text
Abstract:
Ushbu maqolada kichik biznes faoliyatining o‘ziga xos xususiyatlari hamda uningmahalliy boshqaruv bilan mushtarak jihatlari, kichik biznesni mahalliy darajada boshqarishningiqtisodiy mexanizmi tarkibi va ushbu mexanizm samaradorligini baholashning uslubiy masalalaritadqiq etilgan.
APA, Harvard, Vancouver, ISO, and other styles
34

Thomas, David A. "Searching for Significance in Unstructured Data: Text Mining with Leximancer." European Educational Research Journal 13, no. 2 (2014): 235–56. http://dx.doi.org/10.2304/eerj.2014.13.2.235.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Hsu, Pei-Ling, Hsiao-Shan Hsieh, Jheng-He Liang, and Yi-Shin Chen. "Mining various semantic relationships from unstructured user-generated web data." Journal of Web Semantics 31 (March 2015): 27–38. http://dx.doi.org/10.1016/j.websem.2014.11.004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Ji, Ming, Qi He, Jiawei Han, and Scott Spangler. "Mining strong relevance between heterogeneous entities from unstructured biomedical data." Data Mining and Knowledge Discovery 29, no. 4 (2015): 976–98. http://dx.doi.org/10.1007/s10618-014-0396-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Kahya-Özyirmidokuz, Esra. "Analyzing unstructured Facebook social network data through web text mining." Information Development 32, no. 1 (2014): 70–80. http://dx.doi.org/10.1177/0266666914528523.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Jeong, Wuseong, JungJin Kim, and Hanseok Jeong. "Information Extraction from Unstructured Data on Microplastics through Text Mining." Journal of Korean Society of Environmental Engineers 45, no. 1 (2023): 34–42. http://dx.doi.org/10.4491/ksee.2023.45.1.34.

Full text
Abstract:
Objectives:In this study, we seek to provide a thorough insight into how people perceive microplastics and uncover issues and hidden trends about the significant microplastic pollution problems by analyzing unstructured data on microplastics.Methods:Environmental news articles related to microplastics were collected. Text mining techniques including data pre-processing, word cloud, TF-IDF weight-based trend analysis, and LDA topic modeling were used to analyze the amount of textual data.Results and Discussion:The public's interest in microplastics is consistently growing, according to an analysis of all environmental news and the keyword ‘microplastic’ from 2014 to 2021 conducted via BIGKinds. The keyword 'trash' was the overwhelmingly enormous weight among words. The top 5 keywords connected to microplastics did not fade away and continued appearing even though the socially noticeable keywords during the study period varied yearly. This indicates that the primary issue with microplastics related to keywords has not yet been solved. Our study has a limitation of subject diversity because we only focused on microplastic news. The results, however, presented all processes from plastic pollution emergence to treatment, such as microplastic pollution sources, microplastic detection, and prevention methods against microplastics.Conclusion:Text mining analysis was performed on microplastics in environmental news and provided issues and trends on microplastic pollution. This study presents a new methodology for environmental and social problem analysis, suggesting that it could enable a multidimensional understanding of environmental problems and help establish environmental policies.
APA, Harvard, Vancouver, ISO, and other styles
39

Kumar, Akshi, Vikrant Dabas, and Parul Hooda. "Text classification algorithms for mining unstructured data: a SWOT analysis." International Journal of Information Technology 12, no. 4 (2018): 1159–69. http://dx.doi.org/10.1007/s41870-017-0072-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Lee, Jong Hwa, and Hyun-Kyu Lee. "A study on unstructured text mining algorithm through R programming based on data dictionary." Journal of the Korea Industrial Information Systems Research 20, no. 2 (2015): 113–24. http://dx.doi.org/10.9723/jksiis.2015.20.2.113.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Rahman, Nayem. "Data Mining Techniques and Applications." International Journal of Strategic Information Technology and Applications 9, no. 1 (2018): 78–97. http://dx.doi.org/10.4018/ijsita.2018010104.

Full text
Abstract:
Data mining has been gaining attention with the complex business environments, as a rapid increase of data volume and the ubiquitous nature of data in this age of the internet and social media. Organizations are interested in making informed decisions with a complete set of data including structured and unstructured data that originate both internally and externally. Different data mining techniques have evolved over the last two decades. To solve a wide variety of business problems, different data mining techniques are developed. Practitioners and researchers in industry and academia continuously develop and experiment varieties of data mining techniques. This article provides an overview of data mining techniques that are widely used in different fields to discover knowledge and solve business problems. This article provides an update on data mining techniques based on extant literature as of 2018. That might help practitioners and researchers to have a holistic view of data mining techniques.
APA, Harvard, Vancouver, ISO, and other styles
42

Dr., Nirmla Sharma. "Emerging Need for Disruption in the Next Trend of Artificial Intelligence-Controlled Transformation Using Knowledge Mining." International Journal of Engineering and Advanced Technology (IJEAT) 14, no. 3 (2025): 26–32. https://doi.org/10.35940/ijeat.C4564.14030225.

Full text
Abstract:
<strong>Abstract: </strong>Knowledge mining is an emerging type of artificial intelligence (AI), that uses a grouping of AI facilities to determine satisfied thought over huge volumes of unstructured, semi-structured, and structured data that permit industries to extremely recognize their data, search it, expose visions and found associations and designs at scale. Although the initial trend of AI contained numerous slight applications, such as the preparation of a particular model over a single statistics basis of a positive kind for a particular problem, knowledge mining is the next trend of Artificial Intelligence, producing an active quantity of data associations and designs. It has rapidly brought a main part of initiative digital transformation creativity that basically modification how groups brand a sense of real-world statistics. Through this survey, we have analyzed more than two-thirds of 68% of respondents to a current Harvard Business Brush up Analytic Services survey think knowledge mining is key to succeeding in their corporations' considered objectives in the next 18 months. Then the requirement for knowledge mining is rapidly increasing 80% are using physical approaches to switch unstructured data, and those approaches will rapidly be overtaken by the development of statistics and possibly apply circumstances in which this data has delivered excessive rate.
APA, Harvard, Vancouver, ISO, and other styles
43

Ignaczak, Luciano, Guilherme Goldschmidt, Cristiano André Da Costa, and Rodrigo Da Rosa Righi. "Text Mining in Cybersecurity." ACM Computing Surveys 54, no. 7 (2021): 1–36. http://dx.doi.org/10.1145/3462477.

Full text
Abstract:
The growth of data volume has changed cybersecurity activities, demanding a higher level of automation. In this new cybersecurity landscape, text mining emerged as an alternative to improve the efficiency of the activities involving unstructured data. This article proposes a Systematic Literature Review ( SLR ) to present the application of text mining in the cybersecurity domain. Using a systematic protocol, we identified 2,196 studies, out of which 83 were summarized. As a contribution, we propose a taxonomy to demonstrate the different activities in the cybersecurity domain supported by text mining. We also detail the strategies evaluated in the application of text mining tasks and the use of neural networks to support activities involving unstructured data. The work also discusses text classification performance aiming its application in real-world solutions. The SLR also highlights open gaps for future research, such as the analysis of non-English content and the intensification in the usage of neural networks.
APA, Harvard, Vancouver, ISO, and other styles
44

MELLI, GABOR, XINDONG WU, PAUL BEINAT, et al. "TOP-10 DATA MINING CASE STUDIES." International Journal of Information Technology & Decision Making 11, no. 02 (2012): 389–400. http://dx.doi.org/10.1142/s021962201240007x.

Full text
Abstract:
We report on the panel discussion held at the ICDM'10 conference on the top 10 data mining case studies in order to provide a snapshot of where and how data mining techniques have made significant real-world impact. The tasks covered by 10 case studies range from the detection of anomalies such as cancer, fraud, and system failures to the optimization of organizational operations, and include the automated extraction of information from unstructured sources. From the 10 cases we find that supervised methods prevail while unsupervised techniques play a supporting role. Further, significant domain knowledge is generally required to achieve a completed solution. Finally, we find that successful applications are more commonly associated with continual improvement rather than by single "aha moments" of knowledge ("nugget") discovery.
APA, Harvard, Vancouver, ISO, and other styles
45

Yassir, Ali Hameed, Ali A. Mohammed, Adel Abdul-Jabbar Alkhazraji, Mustafa Emad Hameed, Mohammed Saad Talib, and Mohanad Faeq Ali. "Sentimental classification analysis of polarity multi-view textual data using data mining techniques." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 5 (2020): 5526. http://dx.doi.org/10.11591/ijece.v10i5.pp5526-5534.

Full text
Abstract:
The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics.
APA, Harvard, Vancouver, ISO, and other styles
46

Kim, M., and C. Hong. "Unstructured Social Media Data Mining System Based on Emotional Database and Unstructured Information Management Architecture Framework." Advanced Science Letters 23, no. 3 (2017): 1668–72. http://dx.doi.org/10.1166/asl.2017.8614.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Rajasekhar, K., and P. Venkata Maheswara. "Measuring Different Tasks for Unstructured Data and High Speed Data in Data Stream Mining." International Journal of Computer Sciences and Engineering 7, no. 5 (2019): 582–89. http://dx.doi.org/10.26438/ijcse/v7i5.582589.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Lamghari, Zineb. "Unstructured Business Processes Improvement using Process Mining Techniques." ASM Science Journal 17 (March 18, 2022): 1–13. http://dx.doi.org/10.32802/asmscj.2022.965.

Full text
Abstract:
Executing loosely structured processes generate unstructured behaviours. Thus, an Unstructured Business Process (UBP) still has more issues that are difficult to be analysed and to be understood due to its complexity and variability. Moreover, the need of an instantiate response is clearly appeared in operational systems. Therefore, it is required to study related challenges that can be acquired during the transition from the structured BP to the unstructured one. In this context, process mining plays a dominant role to understand business process complexity using event data resulted from business process execution. Mainly, this paper treats three challenges related to unstructured BPs. The first challenge is how to support UBPs at runtime using process mining techniques. The second challenge is how to manage UBP variability taking into consideration variant conditions. The third challenge is how to adapt dynamically UBPs according to the company business rules and conditions.
APA, Harvard, Vancouver, ISO, and other styles
49

Lee, Gi-Eun, and Eun-Jun Park. "Research Trends Related on Hair Style in the Text Mining of Big Data Analysis." Journal of the Korean Society of Cosmetology 30, no. 3 (2024): 492–98. http://dx.doi.org/10.52660/jksc.2024.30.3.492.

Full text
Abstract:
This study identified research trends by conducting keyword word frequency (TF), related word analysis (N-gram), and reverse document frequency (TF-IDF) through text mining, which analyzes text, which is unstructured data in big data, using the title and Korean abstract of domestic academic papers searched as hairstyles keywords in the Research Information Service (RISS). Through the research results, it was confirmed that hairstyles represent the characteristics of the times, and they are producing them or writing papers to understand preferences through statistics. The purpose of this study is to provide basic data to identify research trends through keywords extracted through text mining that analyzes text, which is unstructured data, in a new way with the development of big data processing technology.
APA, Harvard, Vancouver, ISO, and other styles
50

Kaur, Payalpreet, Raghu Garg, Ravinder Singh, and Mandeep Singh. "Research on the Application of Web Mining Technique Based on XML for Unstructured Web Data Using LINQ." Advanced Materials Research 403-408 (November 2011): 1062–67. http://dx.doi.org/10.4028/www.scientific.net/amr.403-408.1062.

Full text
Abstract:
Web data mining is a field that has gained popularity in the recent time with the advancement in web mining technologies. Web data mining is the extraction of data on web. The term Web Data Mining is a technique used to crawl through various web resources to collect required information, which enables an individual or a company to promote business, understanding marketing dynamics, new promotions floating on the Internet, etc. The data on web is unstructured, irregular and lacks a fixed unified pattern as it is presented in HTML format that represents data in the presentation format and is unable to handle semi-structured or unstructured data . These difficulties lead to the emergence of XML based web data mining. XML was created so that richly structured documents could be used over the web.XML provides a standard for the data exchange and data storage .This paper presents a web data mining model based on XML. In this model first of all unstructured data is transformed to XML and then XML document is stored in database in the form of the string tree, then specific records are searched using a LINQ query. If record does not exist in the database then check the updates of specific website and repeat the same steps. At last data selected by LINQ Query is displayed on web browser. The feature that helped to increase the speed of data extraction and that also reduces the time of extraction is the presence of database that stores the data that have been extracted earlier by a user and can be used by other users by passing a LINQ query .In this model there is no need to create an extra separate XSL file because this model stores xml document in the database in the form of the string tree. This model is implemented using C# with XML.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!