Literatura académica sobre el tema "Heterogeneous Textual Data Mining"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Heterogeneous Textual Data Mining".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Heterogeneous Textual Data Mining"

1

Ashwini Brahme. "Association Rule Mining and Information Retrieval Using Stemming and Text Mining Techniques". Journal of Information Systems Engineering and Management 10, n.º 18s (11 de marzo de 2025): 622–28. https://doi.org/10.52783/jisem.v10i18s.2958.

Texto completo
Resumen
Heterogeneous, complex and enormous data mining plays significant role in the today’s big data scenario all over the globe. The research paper is intended toward the natural language processing, mining of textual data, and pattern discovery through association rule mining. The research is aimed towards mining of digital news of epidemic diseases and generating the hidden patterns from the corpus data. The present study also aimed towards developing knowledge discovery system for healthcare for prediction of epidemic viral diseases and their related measures which will be helpful for the healthcare experts, doctors, and healthcare organizations as well as for governments also to take the precautionary measures. The study deigned for predictive analytics of epidemic diseases and their patterns using association rule mining. The precautionary measures for the healthcare and highly impacted geographical location of widespread diseases are generated through the proposed system.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Ali, Wajid, Wanli Zuo, Rahman Ali, Xianglin Zuo y Gohar Rahman. "Causality Mining in Natural Languages Using Machine and Deep Learning Techniques: A Survey". Applied Sciences 11, n.º 21 (27 de octubre de 2021): 10064. http://dx.doi.org/10.3390/app112110064.

Texto completo
Resumen
The era of big textual corpora and machine learning technologies have paved the way for researchers in numerous data mining fields. Among them, causality mining (CM) from textual data has become a significant area of concern and has more attention from researchers. Causality (cause-effect relations) serves as an essential category of relationships, which plays a significant role in question answering, future events predication, discourse comprehension, decision making, future scenario generation, medical text mining, behavior prediction, and textual prediction entailment. While, decades of development techniques for CM are still prone to performance enhancement, especially for ambiguous and implicitly expressed causalities. The ineffectiveness of the early attempts is mainly due to small, ambiguous, heterogeneous, and domain-specific datasets constructed by manually linguistic and syntactic rules. Many researchers have deployed shallow machine learning (ML) and deep learning (DL) techniques to deal with such datasets, and they achieved satisfactory performance. In this survey, an effort has been made to address a comprehensive review of some state-of-the-art shallow ML and DL approaches in CM. We present a detailed taxonomy of CM and discuss popular ML and DL approaches with their comparative weaknesses and strengths, applications, popular datasets, and frameworks. Lastly, the future research challenges are discussed with illustrations of how to transform them into productive future research directions.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Makarevich, T. I. "Intellectual Analysis of Textual Information in Domain Fields in the System of e-Government". Digital Transformation, n.º 2 (6 de agosto de 2019): 46–52. http://dx.doi.org/10.38086/2522-9613-2019-2-46-52.

Texto completo
Resumen
The given paper considers application of data mining technology in scientific research as one of intellectual analysis methods in the domain field of e-Government. The topicality of the issue is stipulated by the current absence of the researches of the kind in the Republic of Belarus. The paper illustrates how the programme package Rapid Miner and the language R have been applied in text mining. Concept indexing has been admitted as the most resultative form of analyzing domain field ontologies. Formal and linguistic approaches are found most effective in analyzing domain field ontologies. The paper identifies the problems of word redundancy and word polysemy. The prognosis for the further research investigation is in interconnectivity of specialized ontologies studying heterogeneous terms on the basis of artificial intelligence (AI).
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Dérozier, Sandra, Robert Bossy, Louise Deléger, Mouhamadou Ba, Estelle Chaix, Olivier Harlé, Valentin Loux, Hélène Falentin y Claire Nédellec. "Omnicrobe, an open-access database of microbial habitats and phenotypes using a comprehensive text mining and data fusion approach". PLOS ONE 18, n.º 1 (20 de enero de 2023): e0272473. http://dx.doi.org/10.1371/journal.pone.0272473.

Texto completo
Resumen
The dramatic increase in the number of microbe descriptions in databases, reports, and papers presents a two-fold challenge for accessing the information: integration of heterogeneous data in a standard ontology-based representation and normalization of the textual descriptions by semantic analysis. Recent text mining methods offer powerful ways to extract textual information and generate ontology-based representation. This paper describes the design of the Omnicrobe application that gathers comprehensive information on habitats, phenotypes, and usages of microbes from scientific sources of high interest to the microbiology community. The Omnicrobe database contains around 1 million descriptions of microbe properties. These descriptions are created by analyzing and combining six information sources of various kinds, i.e. biological resource catalogs, sequence databases and scientific literature. The microbe properties are indexed by the Ontobiotope ontology and their taxa are indexed by an extended version of the taxonomy maintained by the National Center for Biotechnology Information. The Omnicrobe application covers all domains of microbiology. With simple or rich ontology-based queries, it provides easy-to-use support in the resolution of scientific questions related to the habitats, phenotypes, and uses of microbes. We illustrate the potential of Omnicrobe with a use case from the food innovation domain.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Farimani, Saeede Anbaee, Majid Vafaei Jahan y Amin Milani Fard. "From Text Representation to Financial Market Prediction: A Literature Review". Information 13, n.º 10 (29 de septiembre de 2022): 466. http://dx.doi.org/10.3390/info13100466.

Texto completo
Resumen
News dissemination in social media causes fluctuations in financial markets. (Scope) Recent advanced methods in deep learning-based natural language processing have shown promising results in financial market analysis. However, understanding how to leverage large amounts of textual data alongside financial market information is important for the investors’ behavior analysis. In this study, we review over 150 publications in the field of behavioral finance that jointly investigated natural language processing (NLP) approaches and a market data analysis for financial decision support. This work differs from other reviews by focusing on applied publications in computer science and artificial intelligence that contributed to a heterogeneous information fusion for the investors’ behavior analysis. (Goal) We study various text representation methods, sentiment analysis, and information retrieval methods from heterogeneous data sources. (Findings) We present current and future research directions in text mining and deep learning for correlation analysis, forecasting, and recommendation systems in financial markets, such as stocks, cryptocurrencies, and Forex (Foreign Exchange Market).
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Tan, Weiyan. "ESG Performance Prediction and Driver Factor Mining for Listed Companies Based on Machine Learning: A Multi-Source Heterogeneous Data Fusion Analysis". Science, Technology and Social Development Proceedings Series 1 (21 de marzo de 2025): 349–56. https://doi.org/10.70088/tmzjct41.

Texto completo
Resumen
With the acceleration of global economic integration and the growing focus on sustainable development, Environmental, Social, and Governance (ESG) factors have become key standards for evaluating a company's long-term value and risk. However, accurately measuring the ESG performance of listed companies and identifying the underlying driving factors remains a significant challenge. This paper proposes a Transformer-based multi-source heterogeneous data fusion model, MSformer, which analyzes diverse data, including financial reports, news, social media comments, and government announcements. It categorizes the data into three types: time-series structured data, time-series structured mapped data, and textual data. The model enhances feature extraction using the Spatial Frequency-coordinated Attention Mechanism (SFHA) and employs Support Vector Regression (SVR) for prediction. Experimental results show that MSformer outperforms other advanced models, achieving an outstanding 87.4% multi-class accuracy and 0.517 average prediction error, proving its effectiveness and advantage in ESG prediction.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Mikhnenko, Pavel. "Transformation of the largest Russian companies’ business vocabulary in annual reports: Data Mining". Upravlenets 13, n.º 5 (3 de noviembre de 2022): 17–33. http://dx.doi.org/10.29141/2218-5003-2022-13-5-2.

Texto completo
Resumen
One of the promising areas of business analysis is the development of new methods and tools for accounting of nonfinancial and non-numeric information. There is a significant number of theoretical and practical solutions in this field; however, the issues of the transformation dynamics of companies’ business vocabulary need to be studied more extensively. The article aims to identify and interpret latent information reflecting strategic guidelines and conditions for the economic development of Russian enterprises. The methodology of the study is based on the concepts of narrative economics and multimodal business analytics, which is a system of scientific-practical methods for analyzing the activities of economic entities through the use of data from heterogeneous sources. The Data Mining methods and tools for analyzing and systematizing large volumes of textual information were used. The data for research were retrieved from the annual reports of the largest Russian companies for 2018–2020. Among the main indicators of the business vocabulary transformation considered in the paper are the occurrence of unique key tokens (UKTs) and the dynamics of its change, as well as the main contexts of UKTs relevant to the problem of development. The findings indicate noticeable changes in the vocabulary of Russian companies’ annual reports, such as a decline in covering formal aspects of economic activity and a growing debate on the development in the presence of risk. It is shown that these trends were most clearly manifested in the reports of metallurgical and energy enterprises. The research results can serve as a basis for enhancing the analytical and predictive effectiveness of modern business analysis
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Peng, Hao, Jianxin Li, Yangqiu Song, Renyu Yang, Rajiv Ranjan, Philip S. Yu y Lifang He. "Streaming Social Event Detection and Evolution Discovery in Heterogeneous Information Networks". ACM Transactions on Knowledge Discovery from Data 15, n.º 5 (26 de junio de 2021): 1–33. http://dx.doi.org/10.1145/3447585.

Texto completo
Resumen
Events are happening in real world and real time, which can be planned and organized for occasions, such as social gatherings, festival celebrations, influential meetings, or sports activities. Social media platforms generate a lot of real-time text information regarding public events with different topics. However, mining social events is challenging because events typically exhibit heterogeneous texture and metadata are often ambiguous. In this article, we first design a novel event-based meta-schema to characterize the semantic relatedness of social events and then build an event-based heterogeneous information network (HIN) integrating information from external knowledge base. Second, we propose a novel Pairwise Popularity Graph Convolutional Network, named as PP-GCN, based on weighted meta-path instance similarity and textual semantic representation as inputs, to perform fine-grained social event categorization and learn the optimal weights of meta-paths in different tasks. Third, we propose a streaming social event detection and evolution discovery framework for HINs based on meta-path similarity search, historical information about meta-paths, and heterogeneous DBSCAN clustering method. Comprehensive experiments on real-world streaming social text data are conducted to compare various social event detection and evolution discovery algorithms. Experimental results demonstrate that our proposed framework outperforms other alternative social event detection and evolution discovery techniques.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Huang, Ru, Zijian Chen, Jianhua He y Xiaoli Chu. "Dynamic Heterogeneous User Generated Contents-Driven Relation Assessment via Graph Representation Learning". Sensors 22, n.º 4 (11 de febrero de 2022): 1402. http://dx.doi.org/10.3390/s22041402.

Texto completo
Resumen
Cross-domain decision-making systems are suffering a huge challenge with the rapidly emerging uneven quality of user-generated data, which poses a heavy responsibility to online platforms. Current content analysis methods primarily concentrate on non-textual contents, such as images and videos themselves, while ignoring the interrelationship between each user post’s contents. In this paper, we propose a novel framework named community-aware dynamic heterogeneous graph embedding (CDHNE) for relationship assessment, capable of mining heterogeneous information, latent community structure and dynamic characteristics from user-generated contents (UGC), which aims to solve complex non-euclidean structured problems. Specifically, we introduce the Markov-chain-based metapath to extract heterogeneous contents and semantics in UGC. A edge-centric attention mechanism is elaborated for localized feature aggregation. Thereafter, we obtain the node representations from micro perspective and apply it to the discovery of global structure by a clustering technique. In order to uncover the temporal evolutionary patterns, we devise an encoder–decoder structure, containing multiple recurrent memory units, which helps to capture the dynamics for relation assessment efficiently and effectively. Extensive experiments on four real-world datasets are conducted in this work, which demonstrate that CDHNE outperforms other baselines due to the comprehensive node representation, while also exhibiting the superiority of CDHNE in relation assessment. The proposed model is presented as a method of breaking down the barriers between traditional UGC analysis and their abstract network analysis.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Williams, Lowri, Eirini Anthi, Laura Arman y Pete Burnap. "Topic Modelling: Going beyond Token Outputs". Big Data and Cognitive Computing 8, n.º 5 (25 de abril de 2024): 44. http://dx.doi.org/10.3390/bdcc8050044.

Texto completo
Resumen
Topic modelling is a text mining technique for identifying salient themes from a number of documents. The output is commonly a set of topics consisting of isolated tokens that often co-occur in such documents. Manual effort is often associated with interpreting a topic’s description from such tokens. However, from a human’s perspective, such outputs may not adequately provide enough information to infer the meaning of the topics; thus, their interpretability is often inaccurately understood. Although several studies have attempted to automatically extend topic descriptions as a means of enhancing the interpretation of topic models, they rely on external language sources that may become unavailable, must be kept up to date to generate relevant results, and present privacy issues when training on or processing data. This paper presents a novel approach towards extending the output of traditional topic modelling methods beyond a list of isolated tokens. This approach removes the dependence on external sources by using the textual data themselves by extracting high-scoring keywords and mapping them to the topic model’s token outputs. To compare how the proposed method benchmarks against the state of the art, a comparative analysis against results produced by Large Language Models (LLMs) is presented. Such results report that the proposed method resonates with the thematic coverage found in LLMs and often surpasses such models by bridging the gap between broad thematic elements and granular details. In addition, to demonstrate and reinforce the generalisation of the proposed method, the approach was further evaluated using two other topic modelling methods as the underlying models and when using a heterogeneous unseen dataset. To measure the interpretability of the proposed outputs against those of the traditional topic modelling approach, independent annotators manually scored each output based on their quality and usefulness as well as the efficiency of the annotation task. The proposed approach demonstrated higher quality and usefulness, as well as higher efficiency in the annotation task, in comparison to the outputs of a traditional topic modelling method, demonstrating an increase in their interpretability.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Más fuentes

Tesis sobre el tema "Heterogeneous Textual Data Mining"

1

Saneifar, Hassan. "Locating Information in Heterogeneous log files". Thesis, Montpellier 2, 2011. http://www.theses.fr/2011MON20092/document.

Texto completo
Resumen
Cette thèse s'inscrit dans les domaines des systèmes Question Réponse en domaine restreint, la recherche d'information ainsi que TALN. Les systèmes de Question Réponse (QR) ont pour objectif de retrouver un fragment pertinent d'un document qui pourrait être considéré comme la meilleure réponse concise possible à une question de l'utilisateur. Le but de cette thèse est de proposer une approche de localisation de réponses dans des masses de données complexes et évolutives décrites ci-dessous.. De nos jours, dans de nombreux domaines d'application, les systèmes informatiques sont instrumentés pour produire des rapports d'événements survenant, dans un format de données textuelles généralement appelé fichiers log. Les fichiers logs représentent la source principale d'informations sur l'état des systèmes, des produits, ou encore les causes de problèmes qui peuvent survenir. Les fichiers logs peuvent également inclure des données sur les paramètres critiques, les sorties de capteurs, ou une combinaison de ceux-ci. Ces fichiers sont également utilisés lors des différentes étapes du développement de logiciels, principalement dans l'objectif de débogage et le profilage. Les fichiers logs sont devenus un élément standard et essentiel de toutes les grandes applications. Bien que le processus de génération de fichiers logs est assez simple et direct, l'analyse de fichiers logs pourrait être une tâche difficile qui exige d'énormes ressources de calcul, de temps et de procédures sophistiquées. En effet, il existe de nombreux types de fichiers logs générés dans certains domaines d'application qui ne sont pas systématiquement exploités d'une manière efficace en raison de leurs caractéristiques particulières. Dans cette thèse, nous nous concentrerons sur un type des fichiers logs générés par des systèmes EDA (Electronic Design Automation). Ces fichiers logs contiennent des informations sur la configuration et la conception des Circuits Intégrés (CI) ainsi que les tests de vérification effectués sur eux. Ces informations, très peu exploitées actuellement, sont particulièrement attractives et intéressantes pour la gestion de conception, la surveillance et surtout la vérification de la qualité de conception. Cependant, la complexité de ces données textuelles complexes, c.-à-d. des fichiers logs générés par des outils de conception de CI, rend difficile l'exploitation de ces connaissances. Plusieurs aspects de ces fichiers logs ont été moins soulignés dans les méthodes de TALN et Extraction d'Information (EI). Le grand volume de données et leurs caractéristiques particulières limitent la pertinence des méthodes classiques de TALN et EI. Dans ce projet de recherche nous cherchons à proposer une approche qui permet de répondre à répondre automatiquement aux questionnaires de vérification de qualité des CI selon les informations se trouvant dans les fichiers logs générés par les outils de conception. Au sein de cette thèse, nous étudions principalement "comment les spécificités de fichiers logs peuvent influencer l'extraction de l'information et les méthodes de TALN?". Le problème est accentué lorsque nous devons également prendre leurs structures évolutives et leur vocabulaire spécifique en compte. Dans ce contexte, un défi clé est de fournir des approches qui prennent les spécificités des fichiers logs en compte tout en considérant les enjeux qui sont spécifiques aux systèmes QR dans des domaines restreints. Ainsi, les contributions de cette thèse consistent brièvement en :〉Proposer une méthode d'identification et de reconnaissance automatique des unités logiques dans les fichiers logs afin d'effectuer une segmentation textuelle selon la structure des fichiers. Au sein de cette approche, nous proposons un type original de descripteur qui permet de modéliser la structure textuelle et le layout des documents textuels.〉Proposer une approche de la localisation de réponse (recherche de passages) dans les fichiers logs. Afin d'améliorer la performance de recherche de passage ainsi que surmonter certains problématiques dûs aux caractéristiques des fichiers logs, nous proposons une approches d'enrichissement de requêtes. Cette approches, fondée sur la notion de relevance feedback, consiste en un processus d'apprentissage et une méthode de pondération des mots pertinents du contexte qui sont susceptibles d'exister dans les passage adaptés. Cela dit, nous proposons également une nouvelle fonction originale de pondération (scoring), appelée TRQ (Term Relatedness to Query) qui a pour objectif de donner un poids élevé aux termes qui ont une probabilité importante de faire partie des passages pertinents. Cette approche est également adaptée et évaluée dans les domaines généraux.〉Etudier l'utilisation des connaissances morpho-syntaxiques au sein de nos approches. A cette fin, nous nous sommes intéressés à l'extraction de la terminologie dans les fichiers logs. Ainsi, nous proposons la méthode Exterlog, adaptée aux spécificités des logs, qui permet d'extraire des termes selon des patrons syntaxiques. Afin d'évaluer les termes extraits et en choisir les plus pertinents, nous proposons un protocole de validation automatique des termes qui utilise une mesure fondée sur le Web associée à des mesures statistiques, tout en prenant en compte le contexte spécialisé des logs
In this thesis, we present contributions to the challenging issues which are encounteredin question answering and locating information in complex textual data, like log files. Question answering systems (QAS) aim to find a relevant fragment of a document which could be regarded as the best possible concise answer for a question given by a user. In this work, we are looking to propose a complete solution to locate information in a special kind of textual data, i.e., log files generated by EDA design tools.Nowadays, in many application areas, modern computing systems are instrumented to generate huge reports about occurring events in the format of log files. Log files are generated in every computing field to report the status of systems, products, or even causes of problems that can occur. Log files may also include data about critical parameters, sensor outputs, or a combination of those. Analyzing log files, as an attractive approach for automatic system management and monitoring, has been enjoying a growing amount of attention [Li et al., 2005]. Although the process of generating log files is quite simple and straightforward, log file analysis could be a tremendous task that requires enormous computational resources, long time and sophisticated procedures [Valdman, 2004]. Indeed, there are many kinds of log files generated in some application domains which are not systematically exploited in an efficient way because of their special characteristics. In this thesis, we are mainly interested in log files generated by Electronic Design Automation (EDA) systems. Electronic design automation is a category of software tools for designing electronic systems such as printed circuit boards and Integrated Circuits (IC). In this domain, to ensure the design quality, there are some quality check rules which should be verified. Verification of these rules is principally performed by analyzing the generated log files. In the case of large designs that the design tools may generate megabytes or gigabytes of log files each day, the problem is to wade through all of this data to locate the critical information we need to verify the quality check rules. These log files typically include a substantial amount of data. Accordingly, manually locating information is a tedious and cumbersome process. Furthermore, the particular characteristics of log files, specially those generated by EDA design tools, rise significant challenges in retrieval of information from the log files. The specific features of log files limit the usefulness of manual analysis techniques and static methods. Automated analysis of such logs is complex due to their heterogeneous and evolving structures and the large non-fixed vocabulary.In this thesis, by each contribution, we answer to questions raised in this work due to the data specificities or domain requirements. We investigate throughout this work the main concern "how the specificities of log files can influence the information extraction and natural language processing methods?". In this context, a key challenge is to provide approaches that take the log file specificities into account while considering the issues which are specific to QA in restricted domains. We present different contributions as below:> Proposing a novel method to recognize and identify the logical units in the log files to perform a segmentation according to their structure. We thus propose a method to characterize complex logicalunits found in log files according to their syntactic characteristics. Within this approach, we propose an original type of descriptor to model the textual structure and layout of text documents.> Proposing an approach to locate the requested information in the log files based on passage retrieval. To improve the performance of passage retrieval, we propose a novel query expansion approach to adapt an initial query to all types of corresponding log files and overcome the difficulties like mismatch vocabularies. Our query expansion approach relies on two relevance feedback steps. In the first one, we determine the explicit relevance feedback by identifying the context of questions. The second phase consists of a novel type of pseudo relevance feedback. Our method is based on a new term weighting function, called TRQ (Term Relatedness to Query), introduced in this work, which gives a score to terms of corpus according to their relatedness to the query. We also investigate how to apply our query expansion approach to documents from general domains.> Studying the use of morpho-syntactic knowledge in our approaches. For this purpose, we are interested in the extraction of terminology in the log files. Thus, we here introduce our approach, named Exterlog (EXtraction of TERminology from LOGs), to extract the terminology of log files. To evaluate the extracted terms and choose the most relevant ones, we propose a candidate term evaluation method using a measure, based on the Web and combined with statistical measures, taking into account the context of log files
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Zhou, Wubai. "Data Mining Techniques to Understand Textual Data". FIU Digital Commons, 2017. https://digitalcommons.fiu.edu/etd/3493.

Texto completo
Resumen
More than ever, information delivery online and storage heavily rely on text. Billions of texts are produced every day in the form of documents, news, logs, search queries, ad keywords, tags, tweets, messenger conversations, social network posts, etc. Text understanding is a fundamental and essential task involving broad research topics, and contributes to many applications in the areas text summarization, search engine, recommendation systems, online advertising, conversational bot and so on. However, understanding text for computers is never a trivial task, especially for noisy and ambiguous text such as logs, search queries. This dissertation mainly focuses on textual understanding tasks derived from the two domains, i.e., disaster management and IT service management that mainly utilizing textual data as an information carrier. Improving situation awareness in disaster management and alleviating human efforts involved in IT service management dictates more intelligent and efficient solutions to understand the textual data acting as the main information carrier in the two domains. From the perspective of data mining, four directions are identified: (1) Intelligently generate a storyline summarizing the evolution of a hurricane from relevant online corpus; (2) Automatically recommending resolutions according to the textual symptom description in a ticket; (3) Gradually adapting the resolution recommendation system for time correlated features derived from text; (4) Efficiently learning distributed representation for short and lousy ticket symptom descriptions and resolutions. Provided with different types of textual data, data mining techniques proposed in those four research directions successfully address our tasks to understand and extract valuable knowledge from those textual data. My dissertation will address the research topics outlined above. Concretely, I will focus on designing and developing data mining methodologies to better understand textual information, including (1) a storyline generation method for efficient summarization of natural hurricanes based on crawled online corpus; (2) a recommendation framework for automated ticket resolution in IT service management; (3) an adaptive recommendation system on time-varying temporal correlated features derived from text; (4) a deep neural ranking model not only successfully recommending resolutions but also efficiently outputting distributed representation for ticket descriptions and resolutions.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Al-Mutairy, Badr. "Data mining and integration of heterogeneous bioinformatics data sources". Thesis, Cardiff University, 2008. http://orca.cf.ac.uk/54178/.

Texto completo
Resumen
In this thesis, we have presented a novel approach to interoperability based on the use of biological relationships that have used relationship-based integration to integrate bioinformatics data sources; this refers to the use of different relationship types with different relationship closeness values to link gene expression datasets with other information available in public bioinformatics data sources. These relationships provide flexible linkage for biologists to discover linked data across the biological universe. Relationship closeness is a variable used to measure the closeness of the biological entities in a relationship and is a characteristic of the relationship. The novelty of this approach is that it allows a user to link a gene expression dataset with heterogeneous data sources dynamically and flexibly to facilitate comparative genomics investigations. Our research has demonstrated that using different relationships allows biologists to analyze experimental datasets in different ways, shorten the time needed to analyze the datasets and provide an easier way to undertake this analysis. Thus, it provides more power to biologists to do experimentations using changing threshold values and linkage types. This is achieved in our framework by introducing the Soft Link Model (SLM) and a Relationship Knowledge Base (RKB), which is built and used by SLM. Integration and Data Mining Bioinformatics Data sources system (IDMBD) is implemented as a proof of concept prototype to demonstrate the technique of linkages described in the thesis.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Ur-Rahman, Nadeem. "Textual data mining applications for industrial knowledge management solutions". Thesis, Loughborough University, 2010. https://dspace.lboro.ac.uk/2134/6373.

Texto completo
Resumen
In recent years knowledge has become an important resource to enhance the business and many activities are required to manage these knowledge resources well and help companies to remain competitive within industrial environments. The data available in most industrial setups is complex in nature and multiple different data formats may be generated to track the progress of different projects either related to developing new products or providing better services to the customers. Knowledge Discovery from different databases requires considerable efforts and energies and data mining techniques serve the purpose through handling structured data formats. If however the data is semi-structured or unstructured the combined efforts of data and text mining technologies may be needed to bring fruitful results. This thesis focuses on issues related to discovery of knowledge from semi-structured or unstructured data formats through the applications of textual data mining techniques to automate the classification of textual information into two different categories or classes which can then be used to help manage the knowledge available in multiple data formats. Applications of different data mining techniques to discover valuable information and knowledge from manufacturing or construction industries have been explored as part of a literature review. The application of text mining techniques to handle semi-structured or unstructured data has been discussed in detail. A novel integration of different data and text mining tools has been proposed in the form of a framework in which knowledge discovery and its refinement processes are performed through the application of Clustering and Apriori Association Rule of Mining algorithms. Finally the hypothesis of acquiring better classification accuracies has been detailed through the application of the methodology on case study data available in the form of Post Project Reviews (PPRs) reports. The process of discovering useful knowledge, its interpretation and utilisation has been automated to classify the textual data into two classes.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

ATTANASIO, ANTONIO. "Mining Heterogeneous Urban Data at Multiple Granularity Layers". Doctoral thesis, Politecnico di Torino, 2018. http://hdl.handle.net/11583/2709888.

Texto completo
Resumen
The recent development of urban areas and of the new advanced services supported by digital technologies has generated big challenges for people and city administrators, like air pollution, high energy consumption, traffic congestion, management of public events. Moreover, understanding the perception of citizens about the provided services and other relevant topics can help devising targeted actions in the management. With the large diffusion of sensing technologies and user devices, the capability to generate data of public interest within the urban area has rapidly grown. For instance, different sensors networks deployed in the urban area allow collecting a variety of data useful to characterize several aspects of the urban environment. The huge amount of data produced by different types of devices and applications brings a rich knowledge about the urban context. Mining big urban data can provide decision makers with knowledge useful to tackle the aforementioned challenges for a smart and sustainable administration of urban spaces. However, the high volume and heterogeneity of data increase the complexity of the analysis. Moreover, different sources provide data with different spatial and temporal references. The extraction of significant information from such diverse kinds of data depends also on how they are integrated, hence alternative data representations and efficient processing technologies are required. The PhD research activity presented in this thesis was aimed at tackling these issues. Indeed, the thesis deals with the analysis of big heterogeneous data in smart city scenarios, by means of new data mining techniques and algorithms, to study the nature of urban related processes. The problem is addressed focusing on both infrastructural and algorithmic layers. In the first layer, the thesis proposes the enhancement of the current leading techniques for the storage and elaboration of Big Data. The integration with novel computing platforms is also considered to support parallelization of tasks, tackling the issue of automatic scaling of resources. At algorithmic layer, the research activity aimed at innovating current data mining algorithms, by adapting them to novel Big Data architectures and to Cloud computing environments. Such algorithms have been applied to various classes of urban data, in order to discover hidden but important information to support the optimization of the related processes. This research activity focused on the development of a distributed framework to automatically aggregate heterogeneous data at multiple temporal and spatial granularities and to apply different data mining techniques. Parallel computations are performed according to the MapReduce paradigm and exploiting in-memory computing to reach near-linear computational scalability. By exploring manifold data resolutions in a relatively short time, several additional patterns of data can be discovered, allowing to further enrich the description of urban processes. Such framework is suitably applied to different use cases, where many types of data are used to provide insightful descriptive and predictive analyses. In particular, the PhD activity addressed two main issues in the context of urban data mining: the evaluation of buildings energy efficiency from different energy-related data and the characterization of people's perception and interest about different topics from user-generated content on social networks. For each use case within the considered applications, a specific architectural solution was designed to obtain meaningful and actionable results and to optimize the computational performance and scalability of algorithms, which were extensively validated through experimental tests.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Kubalík, Jakub. "Mining of Textual Data from the Web for Speech Recognition". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237170.

Texto completo
Resumen
Prvotním cílem tohoto projektu bylo prostudovat problematiku jazykového modelování pro rozpoznávání řeči a techniky pro získávání textových dat z Webu. Text představuje základní techniky rozpoznávání řeči a detailněji popisuje jazykové modely založené na statistických metodách. Zvláště se práce zabývá kriterii pro vyhodnocení kvality jazykových modelů a systémů pro rozpoznávání řeči. Text dále popisuje modely a techniky dolování dat, zvláště vyhledávání informací. Dále jsou představeny problémy spojené se získávání dat z webu, a v kontrastu s tím je představen vyhledávač Google. Součástí projektu byl návrh a implementace systému pro získávání textu z webu, jehož detailnímu popisu je věnována náležitá pozornost. Nicméně, hlavním cílem práce bylo ověřit, zda data získaná z Webu mohou mít nějaký přínos pro rozpoznávání řeči. Popsané techniky se tak snaží najít optimální způsob, jak data získaná z Webu použít pro zlepšení ukázkových jazykových modelů, ale i modelů nasazených v reálných rozpoznávacích systémech.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Nimmagadda, Shastri Lakshman. "Ontology based data warehousing for mining of heterogeneous and multidimensional data sources". Thesis, Curtin University, 2015. http://hdl.handle.net/20.500.11937/2322.

Texto completo
Resumen
Heterogeneous and multidimensional big-data sources are virtually prevalent in all business environments. System and data analysts are unable to fast-track and access big-data sources. A robust and versatile data warehousing system is developed, integrating domain ontologies from multidimensional data sources. For example, petroleum digital ecosystems and digital oil field solutions, derived from big-data petroleum (information) systems, are in increasing demand in multibillion dollar resource businesses worldwide. This work is recognized by Industrial Electronic Society of IEEE and appeared in more than 50 international conference proceedings and journals.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Preti, Giulia. "On the discovery of relevant structures in dynamic and heterogeneous data". Doctoral thesis, Università degli studi di Trento, 2019. http://hdl.handle.net/11572/242978.

Texto completo
Resumen
We are witnessing an explosion of available data coming from a huge amount of sources and domains, which is leading to the creation of datasets larger and larger, as well as richer and richer. Understanding, processing, and extracting useful information from those datasets requires specialized algorithms that take into consideration both the dynamism and the heterogeneity of the data they contain. Although several pattern mining techniques have been proposed in the literature, most of them fall short in providing interesting structures when the data can be interpreted differently from user to user, when it can change from time to time, and when it has different representations. In this thesis, we propose novel approaches that go beyond the traditional pattern mining algorithms, and can effectively and efficiently discover relevant structures in dynamic and heterogeneous settings. In particular, we address the task of pattern mining in multi-weighted graphs, pattern mining in dynamic graphs, and pattern mining in heterogeneous temporal databases. In pattern mining in multi-weighted graphs, we consider the problem of mining patterns for a new category of graphs called emph{multi-weighted graphs}. In these graphs, nodes and edges can carry multiple weights that represent, for example, the preferences of different users or applications, and that are used to assess the relevance of the patterns. We introduce a novel family of scoring functions that assign a score to each pattern based on both the weights of its appearances and their number, and that respect the anti-monotone property, pivotal for efficient implementations. We then propose a centralized and a distributed algorithm that solve the problem both exactly and approximately. The approximate solution has better scalability in terms of the number of edge weighting functions, while achieving good accuracy in the results found. An extensive experimental study shows the advantages and disadvantages of our strategies, and proves their effectiveness. Then, in pattern mining in dynamic graphs, we focus on the particular task of discovering structures that are both well-connected and correlated over time, in graphs where nodes and edges can change over time. These structures represent edges that are topologically close and exhibit a similar behavior of appearance and disappearance in the snapshots of the graph. To this aim, we introduce two measures for computing the density of a subgraph whose edges change in time, and a measure to compute their correlation. The density measures are able to detect subgraphs that are silent in some periods of time but highly connected in the others, and thus they can detect events or anomalies happened in the network. The correlation measure can identify groups of edges that tend to co-appear together, as well as edges that are characterized by similar levels of activity. For both variants of density measure, we provide an effective solution that enumerates all the maximal subgraphs whose density and correlation exceed given minimum thresholds, but can also return a more compact subset of representative subgraphs that exhibit high levels of pairwise dissimilarity. Furthermore, we propose an approximate algorithm that scales well with the size of the network, while achieving a high accuracy. We evaluate our framework with an extensive set of experiments on both real and synthetic datasets, and compare its performance with the main competitor algorithm. The results confirm the correctness of the exact solution, the high accuracy of the approximate, and the superiority of our framework over the existing solutions. In addition, they demonstrate the scalability of the framework and its applicability to networks of different nature. Finally, we address the problem of entity resolution in heterogeneous temporal data-ba-se-s, which are datasets that contain records that give different descriptions of the status of real-world entities at different periods of time, and thus are characterized by different sets of attributes that can change over time. Detecting records that refer to the same entity in such scenario requires a record similarity measure that takes into account the temporal information and that is aware of the absence of a common fixed schema between the records. However, existing record matching approaches either ignore the dynamism in the attribute values of the records, or assume that all the records share the same set of attributes throughout time. In this thesis, we propose a novel time-aware schema-agnostic similarity measure for temporal records to find pairs of matching records, and integrate it into an exact and an approximate algorithm. The exact algorithm can find all the maximal groups of pairwise similar records in the database. The approximate algorithm, on the other hand, can achieve higher scalability with the size of the dataset and the number of attributes, by relying on a technique called meta-blocking. This algorithm can find a good-quality approximation of the actual groups of similar records, by adopting an effective and efficient clustering algorithm.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Preti, Giulia. "On the discovery of relevant structures in dynamic and heterogeneous data". Doctoral thesis, Università degli studi di Trento, 2019. http://hdl.handle.net/11572/242978.

Texto completo
Resumen
We are witnessing an explosion of available data coming from a huge amount of sources and domains, which is leading to the creation of datasets larger and larger, as well as richer and richer. Understanding, processing, and extracting useful information from those datasets requires specialized algorithms that take into consideration both the dynamism and the heterogeneity of the data they contain. Although several pattern mining techniques have been proposed in the literature, most of them fall short in providing interesting structures when the data can be interpreted differently from user to user, when it can change from time to time, and when it has different representations. In this thesis, we propose novel approaches that go beyond the traditional pattern mining algorithms, and can effectively and efficiently discover relevant structures in dynamic and heterogeneous settings. In particular, we address the task of pattern mining in multi-weighted graphs, pattern mining in dynamic graphs, and pattern mining in heterogeneous temporal databases. In pattern mining in multi-weighted graphs, we consider the problem of mining patterns for a new category of graphs called emph{multi-weighted graphs}. In these graphs, nodes and edges can carry multiple weights that represent, for example, the preferences of different users or applications, and that are used to assess the relevance of the patterns. We introduce a novel family of scoring functions that assign a score to each pattern based on both the weights of its appearances and their number, and that respect the anti-monotone property, pivotal for efficient implementations. We then propose a centralized and a distributed algorithm that solve the problem both exactly and approximately. The approximate solution has better scalability in terms of the number of edge weighting functions, while achieving good accuracy in the results found. An extensive experimental study shows the advantages and disadvantages of our strategies, and proves their effectiveness. Then, in pattern mining in dynamic graphs, we focus on the particular task of discovering structures that are both well-connected and correlated over time, in graphs where nodes and edges can change over time. These structures represent edges that are topologically close and exhibit a similar behavior of appearance and disappearance in the snapshots of the graph. To this aim, we introduce two measures for computing the density of a subgraph whose edges change in time, and a measure to compute their correlation. The density measures are able to detect subgraphs that are silent in some periods of time but highly connected in the others, and thus they can detect events or anomalies happened in the network. The correlation measure can identify groups of edges that tend to co-appear together, as well as edges that are characterized by similar levels of activity. For both variants of density measure, we provide an effective solution that enumerates all the maximal subgraphs whose density and correlation exceed given minimum thresholds, but can also return a more compact subset of representative subgraphs that exhibit high levels of pairwise dissimilarity. Furthermore, we propose an approximate algorithm that scales well with the size of the network, while achieving a high accuracy. We evaluate our framework with an extensive set of experiments on both real and synthetic datasets, and compare its performance with the main competitor algorithm. The results confirm the correctness of the exact solution, the high accuracy of the approximate, and the superiority of our framework over the existing solutions. In addition, they demonstrate the scalability of the framework and its applicability to networks of different nature. Finally, we address the problem of entity resolution in heterogeneous temporal data-ba-se-s, which are datasets that contain records that give different descriptions of the status of real-world entities at different periods of time, and thus are characterized by different sets of attributes that can change over time. Detecting records that refer to the same entity in such scenario requires a record similarity measure that takes into account the temporal information and that is aware of the absence of a common fixed schema between the records. However, existing record matching approaches either ignore the dynamism in the attribute values of the records, or assume that all the records share the same set of attributes throughout time. In this thesis, we propose a novel time-aware schema-agnostic similarity measure for temporal records to find pairs of matching records, and integrate it into an exact and an approximate algorithm. The exact algorithm can find all the maximal groups of pairwise similar records in the database. The approximate algorithm, on the other hand, can achieve higher scalability with the size of the dataset and the number of attributes, by relying on a technique called meta-blocking. This algorithm can find a good-quality approximation of the actual groups of similar records, by adopting an effective and efficient clustering algorithm.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Fang, Chunsheng. "Novel Frameworks for Mining Heterogeneous and Dynamic Networks". University of Cincinnati / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1321369978.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Más fuentes

Libros sobre el tema "Heterogeneous Textual Data Mining"

1

P, Deepak y Anna Jurek-Loughrey, eds. Linking and Mining Heterogeneous and Multi-view Data. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-01872-6.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Inmon, William H. Tapping into unstructured data: Integrating unstructured data and textual analytics into business intelligence. Upper Saddle River, NJ: Prentice Hall, 2008.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

P, Deepak y Anna Jurek-Loughrey. Linking and Mining Heterogeneous and Multi-view Data. Springer, 2018.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Yu, Philip S. y Chuan Shi. Heterogeneous Information Network Analysis and Applications. Springer, 2018.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Yu, Philip S. y Chuan Shi. Heterogeneous Information Network Analysis and Applications. Springer, 2017.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Mds'13: 2013 Workshop on Mining Data Semantics in Heterogeneous Information Networks. Association for Computing Machinery, 2013.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Textual Data Science with R. Taylor & Francis Group, 2019.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Bécue-Bertaut, Mónica. Textual Data Science with R. Taylor & Francis Group, 2019.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Bécue-Bertaut, Mónica. Textual Data Science with R. Taylor & Francis Group, 2019.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Bécue-Bertaut, Mónica. Textual Data Science with R. Taylor & Francis Group, 2019.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Más fuentes

Capítulos de libros sobre el tema "Heterogeneous Textual Data Mining"

1

Yan, Xiaoqiang, Yiqiao Mao, Shizhe Hu y Yangdong Ye. "Heterogeneous Dual-Task Clustering with Visual-Textual Information". En Proceedings of the 2020 SIAM International Conference on Data Mining, 658–66. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2020. http://dx.doi.org/10.1137/1.9781611976236.74.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Grüger, Joscha, Tobias Geyer, Martin Kuhn, StephanA Braun y Ralph Bergmann. "Verifying Guideline Compliance in Clinical Treatment Using Multi-perspective Conformance Checking: A Case Study". En Lecture Notes in Business Information Processing, 301–13. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98581-3_22.

Texto completo
Resumen
AbstractClinical guidelines support physicians in the evidence-based treatment of patients. The technical verification of guideline compliance is not trivial, since guideline knowledge is usually represented textually and none of the approaches to computer-interpretable guideline representation has yet been able to establish itself. Due to the procedural nature of treatment sequences, this case study examines the applicability of a guideline process model to real hospital data for verification of guideline compliance. For this purpose, the limitations and challenges in the transformation of clinical data into an event log and in the application of conformance checking to align the data with the guideline reference model are investigated. As a data set, we use treatment data of skin tumor patients from a cancer registry enriched by hospital information system data. The results show the difficulty of applying process mining to medically complex and heterogeneous data and the need for complex preprocessing. The variability of clinical processes makes the application of global conformance checking algorithms challenging. In addition, the work shows the semantic weakness of the alignments and the need for new semantically sensitive approaches.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Banchs, Rafael E. "Handling Textual Data". En Text Mining with MATLAB®, 15–32. New York, NY: Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-4151-9_2.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Poon, Leonard K. M., Chun Fai Leung y Nevin L. Zhang. "Mining Textual Reviews with Hierarchical Latent Tree Analysis". En Data Mining and Big Data, 401–8. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-61845-6_40.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Zhao, Qiang Li, Yan Huang Jiang y Ming Xu. "Incremental Learning by Heterogeneous Bagging Ensemble". En Advanced Data Mining and Applications, 1–12. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-17313-4_1.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Aggarwal, Charu C. "Joint Text Mining with Heterogeneous Data". En Machine Learning for Text, 235–58. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-73531-3_8.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Aggarwal, Charu C. "Joint Text Mining with Heterogeneous Data". En Machine Learning for Text, 233–56. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-96623-2_8.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Yang, Yan, Xiangjuan Yao y Dunwei Gong. "Clustering Study of Crowdsourced Test Report with Multi-source Heterogeneous Information". En Data Mining and Big Data, 135–45. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-32-9563-6_14.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Chen, Jiali, Kai Jiang, Rupeng Liang, Jing Wang, Shaoqiu Zheng y Ying Tan. "Heterogeneous Multi-unit Control with Curriculum Learning for Multi-agent Reinforcement Learning". En Data Mining and Big Data, 3–16. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-9297-1_1.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Shao, Hao, Bin Tong y Einoshin Suzuki. "Query by Committee in a Heterogeneous Environment". En Advanced Data Mining and Applications, 186–98. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-35527-1_16.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Heterogeneous Textual Data Mining"

1

Park, Jongmin, Seunghoon Han, Jong-Ryul Lee y Sungsu Lim. "Multi-Hyperbolic Space-Based Heterogeneous Graph Attention Network". En 2024 IEEE International Conference on Data Mining (ICDM), 815–20. IEEE, 2024. https://doi.org/10.1109/icdm59182.2024.00098.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Wang, Xuan, Yu Zhang, Aabhas Chauhan, Qi Li y Jiawei Han. "Textual Evidence Mining via Spherical Heterogeneous Information Network Embedding". En 2020 IEEE International Conference on Big Data (Big Data). IEEE, 2020. http://dx.doi.org/10.1109/bigdata50022.2020.9377958.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Fize, Jacques, Mathieu Roche y Maguelonne Teisseire. "Matching Heterogeneous Textual Data Using Spatial Features". En 2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2018. http://dx.doi.org/10.1109/icdmw.2018.00197.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Hutahaean, Junko y Kai Simon. "Use of Natural Language Processing and Computer Vision in Deep Learning for Equipment Failure Investigation on Drilling Tools". En International Petroleum Technology Conference. IPTC, 2025. https://doi.org/10.2523/iptc-24706-ms.

Texto completo
Resumen
Abstract Incident investigation analysis within the oil and gas industry is a critical process to ensure operational safety, minimize downtime, and improve asset management. However, the sheer volume and heterogeneous nature of data sources (including structured and unstructured text and visual information) present significant challenges to traditional methods of incident classification and contextual understanding and are labor-intensive and error-prone. This paper addresses these challenges by proposing an approach that harnesses natural language processing (NLP) and computer vision techniques in deep learning for equipment failure investigation analysis in drilling tools. The first component of our approach focuses on leveraging NLP for automated incident classification from a mixture of structured and unstructured text data within the oil and gas industry. With vast volumes of data generated from maintenance logs, technician reports, and incident summaries, manual incident classification becomes impractical and error-prone. By applying advanced NLP algorithms, including text mining and sentiment analysis, we automate the process of categorizing incidents, enabling real-time prioritization and deeper semantic analysis. The second component introduces a novel application of computer vision, where we employ deep learning-based techniques to detect and extract textual information from images captured on various electronic boards. By training models on annotated image datasets, our methodology facilitates the extraction of textual content from diverse electronic boards, enriching the incident investigation process with valuable insights. Our NLP methodology analyzes the textual content of diverse data sources and enables rapid identification, categorization, and prioritization of critical incidents. By automating text detection from visual electronic board sources, the computer vision model built in this study enhances incident data collection, improves incident context understanding, facilitates efficient information extraction, and facilitates more accurate root cause analysis. Through empirical validation and case studies, we demonstrate the efficacy and novelty of our integrated approach. Our methodology streamlines incident investigation analysis by automating incident classification and text extraction from visual sources, providing deeper insights into incident contexts, and enabling more informed decision-making. This scalable and effective solution improves incident response, enhances operational safety, and preserves asset integrity within the oil and gas sector, offering a transformative approach to complex incident analysis challenges.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Roche, Mathieu y Maguelonne Teisseire. "Integrating Textual Data into Heterogeneous Data Ingestion Processing". En 2021 IEEE International Conference on Big Data (Big Data). IEEE, 2021. http://dx.doi.org/10.1109/bigdata52589.2021.9671759.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

"Knowledge graph Extraction from Textual data using LLM". En Data Mining and Data Warehauses – Sikdd 2024. Jožef Stefan Instutute, 2024. http://dx.doi.org/10.70314/is.2024.sikdd.15.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Caputo, G. M. y N. F. F. Ebecken. "Computational system for the textual processing of industrial patents". En DATA MINING AND MIS 2006. Southampton, UK: WIT Press, 2006. http://dx.doi.org/10.2495/data060171.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Tan, Pang-Ning, Hannah Blau, Steve Harp y Robert Goldman. "Textual data mining of service center call records". En the sixth ACM SIGKDD international conference. New York, New York, USA: ACM Press, 2000. http://dx.doi.org/10.1145/347090.347177.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Michalenko, Joshua J., Andrew S. Lan y Richard G. Baraniuk. "Data-Mining Textual Responses to Uncover Misconception Patterns". En L@S 2017: Fourth (2017) ACM Conference on Learning @ Scale. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3051457.3053996.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Xu, Jia. "Joint Visual and Textual Mining on Social Media". En 2014 IEEE International Conference on Data Mining Workshop (ICDMW). IEEE, 2014. http://dx.doi.org/10.1109/icdmw.2014.114.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Informes sobre el tema "Heterogeneous Textual Data Mining"

1

Dooley, Kevin, Steven Corman y Dan Ballard. Centering Resonance Analysis: A Superior Data Mining Algorithm for Textual Data Streams. Fort Belvoir, VA: Defense Technical Information Center, marzo de 2004. http://dx.doi.org/10.21236/ada422048.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Zinilli, Antonio. Text Mining in Action: Tools and Techniques using Python. Instats Inc., 2024. http://dx.doi.org/10.61700/k4powzm518m5z1739.

Texto completo
Resumen
This seminar provides a comprehensive exploration of text mining techniques using Python, tailored for academic researchers seeking to analyze large textual datasets effectively. Participants will gain hands-on experience with Python libraries and methodologies for natural language processing, sentiment analysis, topic modeling, text classification, and more, enhancing their data analysis capabilities across various disciplines.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Zambrano, Omar, Denisse Laos y Marcos Robles. Global boom, local impacts: Mining revenues and subnational outcomes in Peru 2007-2011. Inter-American Development Bank, mayo de 2014. http://dx.doi.org/10.18235/0011633.

Texto completo
Resumen
The relationship between the abundance of natural resources and socio-economic performance has been a main object of study in the economic development field since Adam Smith. Dominated by the verification of the so called curse of natural resource, the mainstream literature on the topic has been mostly on the study of cross sectional data at the national level, with limited empirical use of exogenous differences in the abundance of natural resources at the subnational level. We explore the case of Peru, a mining-rich middle income country where -exploiting a unique data set constructed for this purpose- we are able to assess systematic differences in district-level welfare outcomes between mining and non-mining districts. We find evidence that the condition of being mining-abundant district have a significant impact on the pace of reduction of poverty rates and inequality levels. We also estimate a heterogeneous response to the mining-abundant condition, finding stronger responses in lower-poverty, higher-inequality districts. Finally, we find a trend suggesting incremental positive marginal effects of the level of exposure to mining transfer, as proxy for the degree of abundance of mining activities, on the reduction of poverty and inequality.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Ansari, S. M., E. M. Schetselaar y J. A. Craven. Three-dimensional magnetotelluric modelling of the Lalor volcanogenic massive-sulfide deposit, Manitoba. Natural Resources Canada/CMSS/Information Management, 2022. http://dx.doi.org/10.4095/328003.

Texto completo
Resumen
Unconstrained magnetotelluric inversion commonly produces insufficient inherent resolution to image ore-system fluid pathways that were structurally thinned during post-emplacement tectonic activity. To improve the resolution in these complex environments, we synthesized the 3-D magnetotelluric (MT) response for geologically realistic models using a finite-element-based forward-modelling tool with unstructured meshes and applied it to the Lalor volcanogenic massive-sulfide deposit in the Snow Lake mining camp, Manitoba. This new tool is based on mapping interpolated or simulated resistivity values from wireline logs onto unstructured tetrahedral meshes to reflect, with the help of 3-D models obtained from lithostratigraphic and lithofacies drillhole logs, the complexity of the host-rock geological structure. The resulting stochastic model provides a more realistic representation of the heterogeneous spatial distribution of the electric resistivity values around the massive, stringer, and disseminated sulfide ore zones. Both models were combined into one seamless tetrahedral mesh of the resistivity field. To capture the complex resistivity distribution in the geophysical forward model, a finite-element code was developed. Comparative analyses of the forward models with MT data acquired at the Earth's surface show a reasonable agreement that explains the regional variations associated with the host rock geological structure and detects the local anomalies associated with the MT response of the ore zones.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

de Kemp, E. A., H. A. J. Russell, B. Brodaric, D. B. Snyder, M. J. Hillier, M. St-Onge, C. Harrison et al. Initiating transformative geoscience practice at the Geological Survey of Canada: Canada in 3D. Natural Resources Canada/CMSS/Information Management, 2022. http://dx.doi.org/10.4095/331097.

Texto completo
Resumen
Application of 3D technologies to the wide range of Geosciences knowledge domains is well underway. These have been operationalized in workflows of the hydrocarbon sector for a half-century, and now in mining for over two decades. In Geosciences, algorithms, structured workflows and data integration strategies can support compelling Earth models, however challenges remain to meet the standards of geological plausibility required for most geoscientific studies. There is also missing links in the institutional information infrastructure supporting operational multi-scale 3D data and model development. Canada in 3D (C3D) is a vision and road map for transforming the Geological Survey of Canada's (GSC) work practice by leveraging emerging 3D technologies. Primarily the transformation from 2D geological mapping, to a well-structured 3D modelling practice that is both data-driven and knowledge-driven. It is tempting to imagine that advanced 3D computational methods, coupled with Artificial Intelligence and Big Data tools will automate the bulk of this process. To effectively apply these methods there is a need, however, for data to be in a well-organized, classified, georeferenced (3D) format embedded with key information, such as spatial-temporal relations, and earth process knowledge. Another key challenge for C3D is the relative infancy of 3D geoscience technologies for geological inference and 3D modelling using sparse and heterogeneous regional geoscience information, while preserving the insights and expertise of geoscientists maintaining scientific integrity of digital products. In most geological surveys, there remains considerable educational and operational challenges to achieve this balance of digital automation and expert knowledge. Emerging from the last two decades of research are more efficient workflows, transitioning from cumbersome, explicit (manual) to reproducible implicit semi-automated methods. They are characterized by integrated and iterative, forward and reverse geophysical modelling, coupled with stratigraphic and structural approaches. The full impact of research and development with these 3D tools, geophysical-geological integration and simulation approaches is perhaps unpredictable, but the expectation is that they will produce predictive, instructive models of Canada's geology that will be used to educate, prioritize and influence sustainable policy for stewarding our natural resources. On the horizon are 3D geological modelling methods spanning the gulf between local and frontier or green-fields, as well as deep crustal characterization. These are key components of mineral systems understanding, integrated and coupled hydrological modelling and energy transition applications, e.g. carbon sequestration, in-situ hydrogen mining, and geothermal exploration. Presented are some case study examples at a range of scales from our efforts in C3D.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

de Kemp, E. A., H. A. J. Russell, B. Brodaric, D. B. Snyder, M. J. Hillier, M. St-Onge, C. Harrison et al. Initiating transformative geoscience practice at the Geological Survey of Canada: Canada in 3D. Natural Resources Canada/CMSS/Information Management, 2023. http://dx.doi.org/10.4095/331871.

Texto completo
Resumen
Application of 3D technologies to the wide range of Geosciences knowledge domains is well underway. These have been operationalized in workflows of the hydrocarbon sector for a half-century, and now in mining for over two decades. In Geosciences, algorithms, structured workflows and data integration strategies can support compelling Earth models, however challenges remain to meet the standards of geological plausibility required for most geoscientific studies. There is also missing links in the institutional information infrastructure supporting operational multi-scale 3D data and model development. Canada in 3D (C3D) is a vision and road map for transforming the Geological Survey of Canada's (GSC) work practice by leveraging emerging 3D technologies. Primarily the transformation from 2D geological mapping, to a well-structured 3D modelling practice that is both data-driven and knowledge-driven. It is tempting to imagine that advanced 3D computational methods, coupled with Artificial Intelligence and Big Data tools will automate the bulk of this process. To effectively apply these methods there is a need, however, for data to be in a well-organized, classified, georeferenced (3D) format embedded with key information, such as spatial-temporal relations, and earth process knowledge. Another key challenge for C3D is the relative infancy of 3D geoscience technologies for geological inference and 3D modelling using sparse and heterogeneous regional geoscience information, while preserving the insights and expertise of geoscientists maintaining scientific integrity of digital products. In most geological surveys, there remains considerable educational and operational challenges to achieve this balance of digital automation and expert knowledge. Emerging from the last two decades of research are more efficient workflows, transitioning from cumbersome, explicit (manual) to reproducible implicit semi-automated methods. They are characterized by integrated and iterative, forward and reverse geophysical modelling, coupled with stratigraphic and structural approaches. The full impact of research and development with these 3D tools, geophysical-geological integration and simulation approaches is perhaps unpredictable, but the expectation is that they will produce predictive, instructive models of Canada's geology that will be used to educate, prioritize and influence sustainable policy for stewarding our natural resources. On the horizon are 3D geological modelling methods spanning the gulf between local and frontier or green-fields, as well as deep crustal characterization. These are key components of mineral systems understanding, integrated and coupled hydrological modelling and energy transition applications, e.g. carbon sequestration, in-situ hydrogen mining, and geothermal exploration. Presented are some case study examples at a range of scales from our efforts in C3D.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía