Dissertations / Theses: 'Analytical solutions for data mining'

1

Reinartz, Thomas [Verfasser]. "Focusing solutions for data mining : analytical studies and experimental results in real world domains / T. Reinartz." Berlin, 1999. http://d-nb.info/965635090/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Yang, Zhao. "Spatial Data Mining Analytical Environment for Large Scale Geospatial Data." ScholarWorks@UNO, 2016. http://scholarworks.uno.edu/td/2284.

Full text

Abstract:

Nowadays, many applications are continuously generating large-scale geospatial data. Vehicle GPS tracking data, aerial surveillance drones, LiDAR (Light Detection and Ranging), world-wide spatial networks, and high resolution optical or Synthetic Aperture Radar imagery data all generate a huge amount of geospatial data. However, as data collection increases our ability to process this large-scale geospatial data in a flexible fashion is still limited. We propose a framework for processing and analyzing large-scale geospatial and environmental data using a “Big Data” infrastructure. Existing Big Data solutions do not include a specific mechanism to analyze large-scale geospatial data. In this work, we extend HBase with Spatial Index(R-Tree) and HDFS to support geospatial data and demonstrate its analytical use with some common geospatial data types and data mining technology provided by the R language. The resulting framework has a robust capability to analyze large-scale geospatial data using spatial data mining and making its outputs available to end users.

APA, Harvard, Vancouver, ISO, and other styles

3

Li, Chenghui. "Data mining for direct marketing, problems and solutions." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ39847.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Ur-Rahman, Nadeem. "Textual data mining applications for industrial knowledge management solutions." Thesis, Loughborough University, 2010. https://dspace.lboro.ac.uk/2134/6373.

Full text

Abstract:

In recent years knowledge has become an important resource to enhance the business and many activities are required to manage these knowledge resources well and help companies to remain competitive within industrial environments. The data available in most industrial setups is complex in nature and multiple different data formats may be generated to track the progress of different projects either related to developing new products or providing better services to the customers. Knowledge Discovery from different databases requires considerable efforts and energies and data mining techniques serve the purpose through handling structured data formats. If however the data is semi-structured or unstructured the combined efforts of data and text mining technologies may be needed to bring fruitful results. This thesis focuses on issues related to discovery of knowledge from semi-structured or unstructured data formats through the applications of textual data mining techniques to automate the classification of textual information into two different categories or classes which can then be used to help manage the knowledge available in multiple data formats. Applications of different data mining techniques to discover valuable information and knowledge from manufacturing or construction industries have been explored as part of a literature review. The application of text mining techniques to handle semi-structured or unstructured data has been discussed in detail. A novel integration of different data and text mining tools has been proposed in the form of a framework in which knowledge discovery and its refinement processes are performed through the application of Clustering and Apriori Association Rule of Mining algorithms. Finally the hypothesis of acquiring better classification accuracies has been detailed through the application of the methodology on case study data available in the form of Post Project Reviews (PPRs) reports. The process of discovering useful knowledge, its interpretation and utilisation has been automated to classify the textual data into two classes.

APA, Harvard, Vancouver, ISO, and other styles

5

Cranley, Nikki. "Challenges and Solutions for Complex Gigabit FTI Networks." International Foundation for Telemetering, 2011. http://hdl.handle.net/10150/595664.

Full text

Abstract:

ITC/USA 2011 Conference Proceedings / The Forty-Seventh Annual International Telemetering Conference and Technical Exhibition / October 24-27, 2011 / Bally's Las Vegas, Las Vegas, Nevada This paper presents a case study of an FTI system with complex requirements in terms of the data acquisition, recording, and post-analysis. Gigabit Ethernet was the technology of choice to facilitate such a system. Recording in a Gigabit Ethernet environment raises a fresh challenge to perform fast data reduction and data mining for post-flight analysis. This paper describes the Quick Access Recorder used in this system and how it addresses this challenge.

APA, Harvard, Vancouver, ISO, and other styles

6

Laurinen, P. (Perttu). "A top-down approach for creating and implementing data mining solutions." Doctoral thesis, University of Oulu, 2006. http://urn.fi/urn:isbn:9514281268.

Full text

Abstract:

Abstract The information age is characterized by ever-growing amounts of data surrounding us. By reproducing this data into usable knowledge we can start moving toward the knowledge age. Data mining is the science of transforming measurable information into usable knowledge. During the data mining process, the measurements pass through a chain of sophisticated transformations in order to acquire knowledge. Furthermore, in some applications the results are implemented as software solutions so that they can be continuously utilized. It is evident that the quality and amount of the knowledge formed is highly dependent on the transformations and the process applied. This thesis presents an application independent concept that can be used for managing the data mining process and implementing the acquired results as software applications. The developed concept is divided into two parts – solution formation and solution implementation. The first part presents a systematic way for finding a data mining solution from a set of measurement data. The developed approach allows for easier application of a variety of algorithms to the data, manages the work chain, and differentiates between the data mining tasks. The method is based on storage of the data between the main stages of the data mining process, where the different stages of the process are defined on the basis of the type of algorithms applied to the data. The efficiency of the process is demonstrated with a case study presenting new solutions for resistance spot welding quality control. The second part of the concept presents a component-based data mining application framework, called Smart Archive, designed for implementing the solution. The framework provides functionality that is common to most data mining applications and is especially suitable for implementing applications that process continuously acquired measurements. The work also proposes an efficient algorithm for utilizing cumulative measurement data in the history component of the framework. Using the framework, it is possible to build high-quality data mining applications with shorter development times by configuring the framework to process application-specific data. The efficiency of the framework is illustrated using a case study presenting the results and implementation principles of an application developed for predicting steel slab temperatures in a hot strip mill. In conclusion, this thesis presents a concept that proposes solutions for two fundamental issues of data mining, the creation of a working data mining solution from a set of measurement data and the implementation of it as a stand-alone application.

APA, Harvard, Vancouver, ISO, and other styles

7

Javaid, Muhammad Athar [Verfasser], and Wolfgang [Akademischer Betreuer] Keller. "Data mining in GRACE monthly solutions / Muhammad Athar Javaid ; Betreuer: Wolfgang Keller." Stuttgart : Universitätsbibliothek der Universität Stuttgart, 2019. http://d-nb.info/1186063777/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Schwarz, Holger. "Integration von Data-Mining und online analytical processing : eine Analyse von Datenschemata, Systemarchitekturen und Optimierungsstrategien /." [S.l. : s.n.], 2003. http://www.bsz-bw.de/cgi-bin/xvms.cgi?SWB10720634.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

He, Jianyi. "THE COMMERCIAL IMPACT ON BUSINESS MODELS OF MEDICAL IMAGING SOLUTIONS THROUGH DATA-ANALYTICAL METHODOLOGIES." Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1620233525109266.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Techaplahetvanich, Kesaraporn. "A visualization framework for exploring correlations among atributes of a large dataset and its applications in data mining." University of Western Australia. School of Computer Science and Software Engineering, 2007. http://theses.library.uwa.edu.au/adt-WU2007.0216.

Full text

Abstract:

[Truncated abstract] Many databases in scientific and business applications have grown exponentially in size in recent years. Accessing and using databases is no longer a specialized activity as more and more ordinary users without any specialized knowledge are trying to gain information from databases. Both expert and ordinary users face significant challenges in understanding the information stored in databases. The databases are so large in most cases that it is impossible to gain useful information by inspecting data tables, which are the most common form of storing data in relational databases. Visualization has emerged as one of the most important techniques for exploring data stored in large databases. Appropriate visualization techniques can reveal trends, correlations and associations in data that are very difficult to understand from a textual representation of the data. This thesis presents several new frameworks for data visualization and visual data mining. The first technique, VisEx, is useful for visual exploration of large multi-attribute datasets and especially for exploring the correlations among the attributes in such datasets. Most previous visualization techniques can display correlations among two or three attributes at a time without excessive screen clutter. ... Although many algorithms for mining association rules have been researched extensively, they do not incorporate users in the process and most of them generate a large number of association rules. It is quite often difficult for the user to analyze a large number of rules to identify a small subset of rules that is of importance to the user. In this thesis I present a framework for the user to interactively mine association rules visually. Another challenging task in data mining is to understand the correlations among the mined association rules. It is often difficult to identify a relevant subset of association rules from a large number of mined rules. A further contribution of this thesis is a simple framework in the VisAR system that allows the user to explore a large number of association rules visually. A variety of businesses have adopted new technologies for storing large amounts of data. Analysis of historical data quite often offers new insights into business processes that may increase productivity and profit. On-line analytical processing (OLAP) has become a powerful tool for business analysts to explore historical data. Effective visualization techniques are very important for supporting OLAP technology. A new technique for the visual exploration of OLAP data cubes is also presented in this thesis.

APA, Harvard, Vancouver, ISO, and other styles

11

Mofidi, Reza. "Data mining and associated analytical tools as decision aids for healthcare practitioners in vascular surgery." Thesis, University of Sunderland, 2018. http://sure.sunderland.ac.uk/9553/.

Full text

Abstract:

Vascular surgery is an increasingly data rich speciality. Planning treatment and assessing outcomes are highly dependent on objective assessment of number of imaging modalities including duplex ultrasound, CT scans and angiograms which are almost exclusively digitally created stored and accessed. Developments such as the national vascular registry mean that treatment outcomes are recorded scrutinised electronically. The widespread availability of data which is collected electronically and stored for future clinical use has created the opportunity to examine the efficacy of investigations and treatments in a way which has hitherto not been possible. In addition, new computational methods for data analysis have provided the opportunity for the clinicians and researchers to utilise this data to address pertinent clinical questions.

APA, Harvard, Vancouver, ISO, and other styles

12

Jenkins, J. Craig, and Thomas V. Maher. "What Should We Do about Source Selection in Event Data? Challenges, Progress, and Possible Solutions." ROUTLEDGE JOURNALS, TAYLOR & FRANCIS LTD, 2016. http://hdl.handle.net/10150/621502.

Full text

Abstract:

The prospect of using the Internet and other Big Data methods to construct event data promises to transform the field but is stymied by the lack of a coherent strategy for addressing the problem of selection. Past studies have shown that event data have significant selection problems. In terms of conventional standards of representativeness, all event data have some unknown level of selection no matter how many sources are included. We summarize recent studies of news selection and outline a strategy for reducing the risks of possible selection bias, including techniques for generating multisource event inventories, estimating larger populations, and controlling for nonrandomness. These build on a relativistic strategy for addressing event selection and the recognition that no event data set can ever be declared completely free of selection bias.

APA, Harvard, Vancouver, ISO, and other styles

13

He, Xin. "A semi-automated framework for the analytical use of gene-centric data with biological ontologies." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/25505.

Full text

Abstract:

Motivation Translational bioinformatics(TBI) has been defined as ‘the development and application of informatics methods that connect molecular entities to clinical entities’ [1], which has emerged as a systems theory approach to bridge the huge wealth of biomedical data into clinical actions using a combination of innovations and resources across the entire spectrum of biomedical informatics approaches [2]. The challenge for TBI is the availability of both comprehensive knowledge based on genes and the corresponding tools that allow their analysis and exploitation. Traditionally, biological researchers usually study one or only a few genes at a time, but in recent years high throughput technologies such as gene expression microarrays, protein mass-spectrometry and next-generation DNA and RNA sequencing have emerged that allow the simultaneous measurement of changes on a genome-wide scale. These technologies usually result in large lists of interesting genes, but meaningful biological interpretation remains a major challenge. Over the last decade, enrichment analysis has become standard practice in the analysis of such gene lists, enabling systematic assessment of the likelihood of differential representation of defined groups of genes compared to suitably annotated background knowledge. The success of such analyses are highly dependent on the availability and quality of the gene annotation data. For many years, genes were annotated by different experts using inconsistent, non-standard terminologies. Large amounts of variation and duplication in these unstructured annotation sets, made them unsuitable for principled quantitative analysis. More recently, a lot of effort has been put into the development and use of structured, domain specific vocabularies to annotate genes. The Gene Ontology is one of the most successful examples of this where genes are annotated with terms from three main clades; biological process, molecular function and cellular component. However, there are many other established and emerging ontologies to aid biological data interpretation, but are rarely used. For the same reason, many bioinformatic tools only support analysis analysis using the Gene Ontology. The lack of annotation coverage and the support for them in existing analytical tools to aid biological interpretation of data has become a major limitation to their utility and uptake. Thus, automatic approaches are needed to facilitate the transformation of unstructured data to unlock the potential of all ontologies, with corresponding bioinformatics tools to support their interpretation. Approaches In this thesis, firstly, similar to the approach in [3,4], I propose a series of computational approaches implemented in a new tool OntoSuite-Miner to address the ontology based gene association data integration challenge. This approach uses NLP based text mining methods for ontology based biomedical text mining. What differentiates my approach from other approaches is that I integrate two of the most wildly used NLP modules into the framework, not only increasing the confidence of the text mining results, but also providing an annotation score for each mapping, based on the number of pieces of evidence in the literature and the number of NLP modules that agreed with the mapping. Since heterogeneous data is important in understanding human disease, the approach was designed to be generic, thus the ontology based annotation generation can be applied to different sources and can be repeated with different ontologies. Secondly, in respect of the second challenge proposed by TBI, to increase the statistical power of the annotation enrichment analysis, I propose OntoSuite-Analytics, which integrates a collection of enrichment analysis methods into a unified open-source software package named topOnto, in the statistical programming language R. The package supports enrichment analysis across multiple ontologies with a set of implemented statistical/topological algorithms, allowing the comparison of enrichment results across multiple ontologies and between different algorithms. Results The methodologies described above were implemented and a Human Disease Ontology (HDO) based gene annotation database was generated by mining three publicly available database, OMIM, GeneRIF and Ensembl variation. With the availability of the HDO annotation and the corresponding ontology enrichment analysis tools in topOnto, I profiled 277 gene classes with human diseases and generated ‘disease environments’ for 1310 human diseases. The exploration of the disease profiles and disease environment provides an overview of known disease knowledge and provides new insights into disease mechanisms. The integration of multiple ontologies into a disease context demonstrates how ‘orthogonal’ ontologies can lead to biological insight that would have been missed by more traditional single ontology analysis.

APA, Harvard, Vancouver, ISO, and other styles

14

Portocarrero-Urdanivia, Cristhian, Angela Ochoa-Cuentas, Luis Arauzo-Gallardo, and Carlos Raymundo. "Hydraulic Fill Assessment Model Using Weathered Granitoids Based on Analytical Solutions to Mitigate Rock Mass Instability in Conventional Underground Mining." Universidad Peruana de Ciencias Aplicadas (UPC), 2021. http://hdl.handle.net/10757/653789.

Full text

Abstract:

El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado. This study uses analytical solutions to assess a hydraulic fill model based on weathered granitoid to increase underground opening stability and mitigate rock bursts during mining operations in a conventional underground mining company located in the Coastal Batholiths of the Peruvian Andes. This study assesses the previous geological database provided by the mine, analyzes the on-site strengths produced by the exploitation works that will subsequently be filled, identifies the quality of the material used in the landfill (granitoids) through laboratory tests, and compares compressive strength at different depths, all contemplated within the landfill model used. This study focuses on the applicability of hydraulic fills in conventional underground mine using natural geological material such as granitoid. Revisión por pares

APA, Harvard, Vancouver, ISO, and other styles

15

Bispo, Carlos Alberto Ferreira. "Uma análise da nova geração de sistemas de apoio à decisão." Universidade de São Paulo, 1998. http://www.teses.usp.br/teses/disponiveis/18/18140/tde-04042004-152849/.

Full text

Abstract:

Nesta dissertação são feitas três abordagens. Na primeira apresentam-se os componentes necessários para que se possa compreender melhor o cenário atual onde se encontram aqueles que são os responsáveis pelo processo decisório nas empresas. São abordados as evoluções do processo decisório e do suporte ao mesmo, suas etapas e os seus fatores de influência. A segunda abordagem é relativa às três ferramentas que constituem a nova geração de Sistemas de Apoio à Decisão. A primeira ferramenta é o data warehouse, um banco de dados específico para propósitos gerenciais e que é independente dos bancos de dados operacionais. A segunda ferramenta é o On-Line Analytical Processing (OLAP) e é utilizada para se realizar análises sofisticadas, que permitem aos seus usuários compreenderem melhor os negócios que são realizados na empresa. A última ferramenta é o data mining que permite que se faça uma análise nos dados armazenados, durante anos, para que se descubram relacionamentos ocultos entres os dados, revelando perfis de compras e de clientes; desta forma, as informações obtidas podem se tornar estratégias de negócios. Com a abordagem destas três novas ferramentas, deseja-se analisar o que existe de mais avançado, atualmente, para dar um melhor suporte ao processo decisório, sem entrar nos detalhes estritamente técnicos destas tecnologias. A terceira abordagem é constituída de exemplos de empresas que implementaram estas ferramentas e os resultados obtidos, assim como pelas tendências destas ferramentas para os próximos anos. In this dissertation we will deal with three approaches. On the first we present the necessary elements to make one understand better the current scenery where the responsible persons for the decision process of companies meet. The evolution of the decision process and its support, phases and influence factors. The second approach is related to the three tools that constitute the new generation of Decision Support Systems. The first tool is the data warehouse, a specific database for the managerial purposes that is independent from the operational databases. The second tool is the On-Line Analytical Processing (OLAP) used in carrying out sophisticated analyses allowing its users a better understanding of the business accomplished in the company. The last tool is the data mining that allows for an analysis of the data stored along the years so that one is able to find out the correct relationship among the collects data, revealing business and clients profiles. In such way all the information gathered in the process can be converted into business strategy. With the approach of these three new tools we intend to analyze the most advanced techniques available nowadays to give a better decision support without getting into strictly technical details of these technologies. The third approach is made up of examples of companies that implemented such tools and the attained results, as well, the trends for these tools in the coming years.

APA, Harvard, Vancouver, ISO, and other styles

16

Maisey, Gemma. "Mining for sleep data: An investigation into the sleep of fly-In fly-out shift workers in the mining industry and potential solutions." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2023. https://ro.ecu.edu.au/theses/2618.

Full text

Abstract:

Shift work in the mining industry is a risk factor for sleep loss leading to impaired alertness, which may adversely impact health and safety risks. This risk is being increasingly recognised by leaders and shift workers in the mining industry, however, there is limited knowledge available on the extent of sleep loss and other potential contributing factors. Furthermore, knowledge of the efficacy of individual interventions to assist shift workers to improve their sleep, and the management of risk at an organisational level is scarce. This PhD thesis involved three studies. The first two studies involved the recruitment of 88 shift workers on a fly-in, fly-out (FIFO) mining operation in Western Australia (WA), undertaken within a business-as-usual model. The third study develops a diagnostic tool to support the systematic assessment of an organisation's Fatigue Risk Management System (FRMS). Study 1 (Chapter 4) investigated sleep behaviours, the prevalence of risk of sleep disorders and the predicted impact on alertness across the roster schedule. Sleep was objectively measured using wrist-activity monitors for the 21-day study period and biomathematical modelling was used to predict alertness across the roster schedule. The prevalence of risk for sleep problems and disorders was determined using scientifically validated sleep questionnaires. We found sleep loss was significantly greater following days shift and night shift compared to days off, which resulted in a 20% reduced alertness across the 14 consecutive shifts at the mining operation. Shift workers reported a high prevalence of risk for sleep disorders including shift work disorder (44%), obstructive sleep apnoea (OSA) (31%) and insomnia (8%); a high proportion of shift workers were obese with a body mass index (BMI) > 30kg/m2 (23%) and consumed hazardous levels of alcohol (36%). All of which may have contributed to sleep loss. In addition, the design of shifts and rosters, specifically, early morning shift start times ( < 06:00) and long shift durations ( > 12 hrs.) may have also adversely impacted sleep duration, as they did not allow for sufficient sleep opportunity. Study 2 (Chapter 5) was a randomised control trial (RCT) that investigated the efficacy of interventions to improve sleep, which included a two-hour sleep education program and biofeedback on sleep through a smartphone application. Sleep was objectively measured using wrist-activity monitors across two roster cycles (42 days) with an intervention received on day 21. Our results were inconclusive and suggest that further research is required to determine the efficacy of these commonly used interventions in the mining industry. In line with the results from Study 1, our interventions may not have been effective in improving sleep duration as the shift and roster design did not allow adequate time off between shifts for sleep ( ≥ 7 h) and daily routines. Study 3 (Chapter 6) used a modified Delphi process that involved 16 global experts, with experience and knowledge in sleep science, chronobiology, and applied fatigue risk management within occupational settings, to define and determine the elements considered essential as part of an FRMS. This study resulted in the development of an FRMS diagnostic tool to systematically assist an organisation in assessing its current level of implementation of an FRMS. The results of the studies within this PhD thesis present several potential benefits for the mining industry. These include an enhanced understanding of the extent of sleep loss and the potential impact on alertness, in addition to contributing factors, including shift and roster design elements and unmanaged sleep disorders. The development of the FRMS diagnostic tool may practically guide mining operations on the elements required to manage risk. These findings may also inform government, occupational health and safety regulatory authorities and shift work organisations more broadly, on the need to identify and manage fatigue, as a result of sleep loss, as a critical risk.

APA, Harvard, Vancouver, ISO, and other styles

17

Chennen, Kirsley. "Maladies rares et "Big Data" : solutions bioinformatiques vers une analyse guidée par les connaissances : applications aux ciliopathies." Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAJ076/document.

Full text

Abstract:

Au cours de la dernière décennie, la recherche biomédicale et la pratique médicale ont été révolutionné par l'ère post-génomique et l'émergence des « Big Data » en biologie. Il existe toutefois, le cas particulier des maladies rares caractérisées par la rareté, allant de l’effectif des patients jusqu'aux connaissances sur le domaine. Néanmoins, les maladies rares représentent un réel intérêt, car les connaissances fondamentales accumulées en temps que modèle d'études et les solutions thérapeutique qui en découlent peuvent également bénéficier à des maladies plus communes. Cette thèse porte sur le développement de nouvelles solutions bioinformatiques, intégrant des données Big Data et des approches guidées par la connaissance pour améliorer l'étude des maladies rares. En particulier, mon travail a permis (i) la création de PubAthena, un outil de criblage de la littérature pour la recommandation de nouvelles publications pertinentes, (ii) le développement d'un outil pour l'analyse de données exomique, VarScrut, qui combine des connaissance multiniveaux pour améliorer le taux de résolution Over the last decade, biomedical research and medical practice have been revolutionized by the post-genomic era and the emergence of Big Data in biology. The field of rare diseases, are characterized by scarcity from the patient to the domain knowledge. Nevertheless, rare diseases represent a real interest as the fundamental knowledge accumulated as well as the developed therapeutic solutions can also benefit to common underlying disorders. This thesis focuses on the development of new bioinformatics solutions, integrating Big Data and Big Data associated approaches to improve the study of rare diseases. In particular, my work resulted in (i) the creation of PubAthena, a tool for the recommendation of relevant literature updates, (ii) the development of a tool for the analysis of exome datasets, VarScrut, which combines multi-level knowledge to improve the resolution rate

APA, Harvard, Vancouver, ISO, and other styles

18

Singh, Rahul. "A model to integrate Data Mining and On-line Analytical Processing: with application to Real Time Process Control." VCU Scholars Compass, 1999. https://scholarscompass.vcu.edu/etd/5521.

Full text

Abstract:

Since the widespread use of computers in business and industry, a lot of research has been done on the design of computer systems to support the decision making task. Decision support systems support decision makers in solving unstructured decision problems by providing tools to help understand and analyze decision problems to help make better decisions. Artiﬁcial intelligence is concerned with creating computer systems that perform tasks that would require intelligence if performed by humans. Much research has focused on using artiﬁcial intelligence to develop decision support systems to provide intelligent decision support. Knowledge discovery from databases, centers around data mining algorithms to discover novel and potentially useful information contained in the large volumes of data that is ubiquitous in contemporary business organizations. Data mining deals with large volumes of data and tries to develop multiple views that the decision maker can use to study this multi-dimensional data. On-line analytical processing (OLAP) provides a mechanism that supports multiple views of multi-dimensional data to facilitate efficient analysis. These two techniques together can provide a powerful mechanism for the analysis of large quantities of data to aid the task of making decisions. This research develops a model for the real time process control of a large manufacturing process using an integrated approach of data mining and on-line analytical processing. Data mining is used to develop models of the process based on the large volumes of the process data. The purpose is to provide prediction and explanatory capability based on the models of the data and to allow for efﬁcient generation of multiple views of the data so as to support analysis on multiple levels. Artiﬁcial neural networks provide a mechanism for predicting the behavior of nonlinear systems, while decision trees provide a mechanism for the explanation of states of systems given a set of inputs and outputs. OLAP is used to generate multidimensional views of the data and support analysis based on models developed by data mining. The architecture and implementation of the model for real-time process control based on the integration of data mining and OLAP is presented in detail. The model is validated by comparing results obtained from the integrated system, OLAP-only and expert opinion. The system is validated using actual process data and the results of this veriﬁcation are presented. A discussion of the results of the validation of the integrated system and some limitations of this research with discussion on possible future research directions is provided.

APA, Harvard, Vancouver, ISO, and other styles

19

Abugessaisa, Imad. "Analytical tools and information-sharing methods supporting road safety organizations." Doctoral thesis, Linköpings universitet, GIS - Geografiska informationssystem, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11596.

Full text

Abstract:

A prerequisite for improving road safety are reliable and consistent sources of information about traffic and accidents, which will help assess the prevailing situation and give a good indication of their severity. In many countries there is under-reporting of road accidents, deaths and injuries, no collection of data at all, or low quality of information. Potential knowledge is hidden, due to the large accumulation of traffic and accident data. This limits the investigative tasks of road safety experts and thus decreases the utilization of databases. All these factors can have serious effects on the analysis of the road safety situation, as well as on the results of the analyses. This dissertation presents a three-tiered conceptual model to support the sharing of road safety–related information and a set of applications and analysis tools. The overall aim of the research is to build and maintain an information-sharing platform, and to construct mechanisms that can support road safety professionals and researchers in their efforts to prevent road accidents. GLOBESAFE is a platform for information sharing among road safety organizations in different countries developed during this research. Several approaches were used, First, requirement elicitation methods were used to identify the exact requirements of the platform. This helped in developing a conceptual model, a common vocabulary, a set of applications, and various access modes to the system. The implementation of the requirements was based on iterative prototyping. Usability methods were introduced to evaluate the users’ interaction satisfaction with the system and the various tools. Second, a system-thinking approach and a technology acceptance model were used in the study of the Swedish traffic data acquisition system. Finally, visual data mining methods were introduced as a novel approach to discovering hidden knowledge and relationships in road traffic and accident databases. The results from these studies have been reported in several scientific articles.

APA, Harvard, Vancouver, ISO, and other styles

20

Smith, Eugene Herbie. "An analytical framework for monitoring and optimizing bank branch network efficiency / E.H. Smith." Thesis, North-West University, 2009. http://hdl.handle.net/10394/5029.

Full text

Abstract:

Financial institutions make use of a variety of delivery channels for servicing their customers. The primary channel utilised as a means of acquiring new customers and increasing market share is through the retail branch network. The 1990s saw the Internet explosion and with it a threat to branches. The relatively low cost associated with virtual delivery channels made it inevitable for financial institutions to direct their focus towards such new and more cost efficient technologies. By the beginning of the 21st century -and with increasing limitations identified in alternative virtual delivery channels, the financial industry returned to a more balanced view which may be seen as the revival of branch networks. The main purpose of this study is to provide a roadmap for financial institutions in managing their branch network. A three step methodology, representative of data mining and management science techniques, will be used to explain relative branch efficiency. The methodology consists of clustering analysis (CA), data envelopment analysis (DEA) and decision tree induction (DTI). CA is applied to data internal to the financial institution for increasing' the discriminatory power of DEA. DEA is used to calculate the relevant operating efficiencies of branches deemed homogeneous during CA. Finally, DTI is used to interpret the DEA results and additional data describing the market environment the branch operates in, as well as inquiring into the nature of the relative efficiency of the branch. Thesis (M.Com. (Computer Science))--North-West University, Potchefstroom Campus, 2010.

APA, Harvard, Vancouver, ISO, and other styles

21

Faramarzi, Asaad. "Intelligent computational solutions for constitutive modelling of materials in finite element analysis." Thesis, University of Exeter, 2011. http://hdl.handle.net/10036/3305.

Full text

Abstract:

Over the past decades simulation techniques, and in particular finite element method, have been used successfully to predict the response of systems across a whole range of industries including aerospace, automotive, chemical processes, geotechnical engineering and many others. In these numerical analyses, the behaviour of the actual material is approximated with that of an idealised material that deforms in accordance with some constitutive relationships. Therefore, the choice of an appropriate constitutive model that adequately describes the behaviour of the material plays an important role in the accuracy and reliability of the numerical predictions. During the past decades several constitutive models have been developed for various materials. In recent years, by rapid and effective developments in computational software and hardware, alternative computer aided pattern recognition techniques have been introduced to constitutive modelling of materials. The main idea behind pattern recognition systems such as neural network, fuzzy logic or genetic programming is that they learn adaptively from experience and extract various discriminants, each appropriate for its purpose. In this thesis a novel approach is presented and employed to develop constitutive models for materials in general and soils in particular based on evolutionary polynomial regression (EPR). EPR is a hybrid data mining technique that searches for symbolic structures (representing the behaviour of a system) using genetic algorithm and estimates the constant values by the least squares method. Stress-strain data from experiments are employed to train and develop EPR-based material models. The developed models are compared with some of the existing conventional constitutive material models and its advantages are highlighted. It is also shown that the developed EPR-based material models can be incorporated in finite element (FE) analysis. Different examples are used to verify the developed EPR-based FE model. The results of the EPR-FEM are compared with those of a standard FEM where conventional constitutive models are used to model the material behaviour. These results show that EPR-FEM can be successfully employed to analyse different structural and geotechnical engineering problems.

APA, Harvard, Vancouver, ISO, and other styles

22

Cazarini, Aline. "Auxílio do Data Warehouse e suas ferramentas à estratégia de CRM analítico." Universidade de São Paulo, 2002. http://www.teses.usp.br/teses/disponiveis/18/18140/tde-06052016-143213/.

Full text

Abstract:

Atualmente, uma das grandes vantagens competitivas que uma empresa possui em relação a seu concorrente é a informação sobre seu cliente. As estratégias de Customer Relationship Management (CRM), propiciam o profundo conhecimento do cliente, para que a empresa possa tratá-lo de forma personalizada e reconhecê-lo como seu principal patrimônio. Segundo TAURION (2000) e DW BRASIL (2001), para suportar essa tecnologia, é necessário que as empresas possuam um repositório de dados históricos de clientes. O Data Warehouse (DW) possui diversas características que utilizam, de forma adequada e eficiente, ferramentas de desenvolvimento de modernos bancos de dados. Através da ferramenta Data Mining (DM), é possível descobrir novas correlações, padrões e tendências entre informações de uma empresa pela extração e análise dos dados do DW. A análise dos dados também pode ser feita através de sistemas On Line Analytical Proccess (OLAP), os quais ajudam analistas a sintetizar informações sobre as empresas, por meio de comparações, visões personalizadas, análise histórica e projeção de dados em vários cenários. Diante deste contexto, parece possível afirmar que o DW, juntamente com o OLAP, podem proporcionar grande suporte à estratégia de CRM. Desta forma, esta pesquisa apresenta como objetivo identificar e analisar as principais contribuições que o DW e suas ferramentas podem dar à estratégia CRM Analítico. Nowadays, the great competitive advantage that a company possesses in relation to your competitor is the information about its customer. The strategies of Customer Relationship Management (CRM) provide deep knowledge about the customer, so that the company can treat them in a personalized way and it recognizes them as its main patrimony. According to TAURION (2000) and DW BRASIL (2001), to support that technology, it is necessary that the companies possess a repository of customers\' historical data. Data Warehouse (DW) possesses several characteristics that use, in appropriate and efficient way, tools of development of modern databases and, through the too Data Mining (DM) discovers new correlations, pattems and tendencies among information of a company, for the analysis of the data of DW. The analysis of the data can also be made through the systems On Line Analytical Proccess (OLAP), which help analysts and executives to synthesize information on the companies, by means of comparisons, personalized visions, historical analysis and projection of data in several sceneries. In this context, it can be stated that DW and DM can provide great support to the strategy of CRM. Thus, this work presents as objective to identify the main contributions that DW and their tools can give to the strategy of Analytical CRM.

APA, Harvard, Vancouver, ISO, and other styles

23

Chudán, David. "Association rule mining as a support for OLAP." Doctoral thesis, Vysoká škola ekonomická v Praze, 2010. http://www.nusl.cz/ntk/nusl-201130.

Full text

Abstract:

The aim of this work is to identify the possibilities of the complementary usage of two analytical methods of data analysis, OLAP analysis and data mining represented by GUHA association rule mining. The usage of these two methods in the context of proposed scenarios on one dataset presumes a synergistic effect, surpassing the knowledge acquired by these two methods independently. This is the main contribution of the work. Another contribution is the original use of GUHA association rules where the mining is performed on aggregated data. In their abilities, GUHA association rules outperform classic association rules referred to the literature. The experiments on real data demonstrate the finding of unusual trends in data that would be very difficult to acquire using standard methods of OLAP analysis, the time consuming manual browsing of an OLAP cube. On the other hand, the actual use of association rules loses a general overview of data. It is possible to declare that these two methods complement each other very well. The part of the solution is also usage of LMCL scripting language that automates selected parts of the data mining process. The proposed recommender system would shield the user from association rules, thereby enabling common analysts ignorant of the association rules to use their possibilities. The thesis combines quantitative and qualitative research. Quantitative research is represented by experiments on a real dataset, proposal of a recommender system and implementation of the selected parts of the association rules mining process by LISp-Miner Control Language. Qualitative research is represented by structured interviews with selected experts from the fields of data mining and business intelligence who confirm the meaningfulness of the proposed methods.

APA, Harvard, Vancouver, ISO, and other styles

24

Ferguson, Cary V. "Using On-line Analytical Processing (OLAP) and data mining to estimate emergency room activity in DoD Medical Treatment Facilities in the Tricare Central Region." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2001. http://handle.dtic.mil/100.2/ADA390326.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Xu, Hua. "Novel data analysis methods and algorithms for identification of peptides and proteins by use of tandem mass spectrometry." Columbus, Ohio : Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1187113396.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Kubín, Richard. "Doménové znalosti, analytické otázky, systém LISp-Miner a data ADAMEK." Master's thesis, Vysoká škola ekonomická v Praze, 2009. http://www.nusl.cz/ntk/nusl-15840.

Full text

Abstract:

The steps associated with the analytical question solving in terms of LISp-Miner system in ADAMEK medical data are the theme of this thesis. The operating sequence of using 4ft-Miner and SD4ft-Miner procedures in ADAMEK data together with the possibility of further use of formalized background knowledge and preparing routing for automatization of the downrighted steps are the objectiv of this thesis. The summary of the basic concepts and axioms of association rules and GUHA method is the content of the theoretical part of the thesis. Operativ part starts from CRISP-DM methodology. The operating sequence enabling searching for interesting association rules in different data, that is applied on STULONG medical data afterwards in order to get instigations for it's revision, is the produce of this thesis. Used data that come from EuroMISE are concern with cardiological patients.

APA, Harvard, Vancouver, ISO, and other styles

27

Wazaefi, Yanal. "Automatic diagnosis of melanoma from dermoscopic images of melanocytic tumors : Analytical and comparative approaches." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM4106.

Full text

Abstract:

Le mélanome est la forme la plus grave de cancer de la peau. Cette thèse a contribué au développement de deux approches différentes pour le diagnostic assisté par ordinateur du mélanome : approche analytique et approche comparative.L'approche analytique imite le comportement du dermatologue en détectant les caractéristiques de malignité sur la base de méthodes analytiques populaires dans une première étape, et en combinant ces caractéristiques dans une deuxième étape. Nous avons étudié l’impacte d’un système du diagnostic automatique utilisant des images dermoscopique de lésions cutanées pigmentées sur le diagnostic de dermatologues. L'approche comparative, appelé concept du Vilain Petit Canard (VPC), suppose que les naevus chez le même patient ont tendance à partager certaines caractéristiques morphologiques ainsi que les dermatologues identifient quelques groupes de similarité. VPC est le naevus qui ne rentre dans aucune de ces groupes, susceptibles d'être mélanome Melanoma is the most serious type of skin cancer. This thesis focused on the development of two different approaches for computer-aided diagnosis of melanoma: analytical approach and comparative approach. The analytical approach mimics the dermatologist’s behavior by first detecting malignancy features based on popular analytical methods, and in a second step, by combining these features. We investigated to what extent the melanoma diagnosis can be impacted by an automatic system using dermoscopic images of pigmented skin lesions. The comparative approach, called Ugly Duckling (UD) concept, assumes that nevi in the same patient tend to share some morphological features so that dermatologists identify a few similarity clusters. UD is the nevus that does not fit into any of those clusters, likely to be suspicious. The goal was to model the ability of dermatologists to build consistent clusters of pigmented skin lesions in patients

APA, Harvard, Vancouver, ISO, and other styles

28

García, Piquer Álvaro. "Facing-up Challenges of Multiobjective Clustering Based on Evolutionary Algorithms: Representations, Scalability and Retrieval Solutions." Doctoral thesis, Universitat Ramon Llull, 2012. http://hdl.handle.net/10803/80090.

Full text

Abstract:

Aquesta tesi es centra en algorismes de clustering multiobjectiu, que estan basats en optimitzar varis objectius simultàniament obtenint una col•lecció de solucions potencials amb diferents compromisos entre objectius. El propòsit d'aquesta tesi consisteix en dissenyar i implementar un nou algorisme de clustering multiobjectiu basat en algorismes evolutius per afrontar tres reptes actuals relacionats amb aquest tipus de tècniques. El primer repte es centra en definir adequadament l'àrea de possibles solucions que s'explora per obtenir la millor solució i que depèn de la representació del coneixement. El segon repte consisteix en escalar el sistema dividint el conjunt de dades original en varis subconjunts per treballar amb menys dades en el procés de clustering. El tercer repte es basa en recuperar la solució més adequada tenint en compte la qualitat i la forma dels clusters a partir de la regió més interessant de la col•lecció de solucions ofertes per l’algorisme. Esta tesis se centra en los algoritmos de clustering multiobjetivo, que están basados en optimizar varios objetivos simultáneamente obteniendo una colección de soluciones potenciales con diferentes compromisos entre objetivos. El propósito de esta tesis consiste en diseñar e implementar un nuevo algoritmo de clustering multiobjetivo basado en algoritmos evolutivos para afrontar tres retos actuales relacionados con este tipo de técnicas. El primer reto se centra en definir adecuadamente el área de posibles soluciones explorada para obtener la mejor solución y que depende de la representación del conocimiento. El segundo reto consiste en escalar el sistema dividiendo el conjunto de datos original en varios subconjuntos para trabajar con menos datos en el proceso de clustering El tercer reto se basa en recuperar la solución más adecuada según la calidad y la forma de los clusters a partir de la región más interesante de la colección de soluciones ofrecidas por el algoritmo. This thesis is focused on multiobjective clustering algorithms, which are based on optimizing several objectives simultaneously obtaining a collection of potential solutions with different trade¬offs among objectives. The goal of the thesis is to design and implement a new multiobjective clustering technique based on evolutionary algorithms for facing up three current challenges related to these techniques. The first challenge is focused on successfully defining the area of possible solutions that is explored in order to find the best solution, and this depends on the knowledge representation. The second challenge tries to scale-up the system splitting the original data set into several data subsets in order to work with less data in the clustering process. The third challenge is addressed to the retrieval of the most suitable solution according to the quality and shape of the clusters from the most interesting region of the collection of solutions returned by the algorithm.

APA, Harvard, Vancouver, ISO, and other styles

29

Erdem, Omer. "Developing A New Method In Efficiency Measurement Problems." Phd thesis, METU, 2013. http://etd.lib.metu.edu.tr/upload/12615390/index.pdf.

Full text

Abstract:

Data Envelopment Analysis (DEA) is a powerful technique for relatively efficiency measurement and it is intensively used in different kind of disciplines but this technique has some drawbacks. In the conventional DEA technique, total number of inputs and outputs is determined by the number of evaluated firms. Therefore, this powerful efficiency measurement technique cannot be employed for limited number firm problems. DEA uses realized data so it can be used for objective evaluations. However, in some Occupational Health and Safety (OHS) and mining cases, subjective evaluation is also very important so it should be included in DEA analyses. To get rid of these drawbacks, a new technique is developed with integration of DEA and Analytical Hierarchy Process (AHP) and it is named as AHP.DEA Method. The developed method creates an opportunity using more inputs and outputs in the relatively efficiency measurement for limited number firm cases. Therefore, reliability of the estimation is increased with increasing the number of inputs and outputs in the estimations. The AHP.DEA technique also integrates both subjective opinion of experts and objective evaluation. Combination of them can give more consistent results when compared only subjective or objective evaluation methods. After the application of AHP.DEA method in mining and OHS industry, managers of mining companies can compare their organizations with the competitors or their branches and they can identify strengths and weakness of them. Therefore, quantity and quality of output may be increased while number of accidents is decreased and also new opportunities can be identified to upgrade current operations.

APA, Harvard, Vancouver, ISO, and other styles

30

Dang, Vinh Q. "Evolutionary approaches for feature selection in biological data." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2014. https://ro.ecu.edu.au/theses/1276.

Full text

Abstract:

Data mining techniques have been used widely in many areas such as business, science, engineering and medicine. The techniques allow a vast amount of data to be explored in order to extract useful information from the data. One of the foci in the health area is finding interesting biomarkers from biomedical data. Mass throughput data generated from microarrays and mass spectrometry from biological samples are high dimensional and is small in sample size. Examples include DNA microarray datasets with up to 500,000 genes and mass spectrometry data with 300,000 m/z values. While the availability of such datasets can aid in the development of techniques/drugs to improve diagnosis and treatment of diseases, a major challenge involves its analysis to extract useful and meaningful information. The aims of this project are: 1) to investigate and develop feature selection algorithms that incorporate various evolutionary strategies, 2) using the developed algorithms to find the “most relevant” biomarkers contained in biological datasets and 3) and evaluate the goodness of extracted feature subsets for relevance (examined in terms of existing biomedical domain knowledge and from classification accuracy obtained using different classifiers). The project aims to generate good predictive models for classifying diseased samples from control.

APA, Harvard, Vancouver, ISO, and other styles

31

Oliveira, Saullo Haniell Galvão de 1988. "On biclusters aggregation and its benefits for enumerative solutions = Agregação de biclusters e seus benefícios para soluções enumerativas." [s.n.], 2015. http://repositorio.unicamp.br/jspui/handle/REPOSIP/259072.

Full text

Abstract:

Orientador: Fernando José Von Zuben Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação Made available in DSpace on 2018-08-27T03:28:44Z (GMT). No. of bitstreams: 1 Oliveira_SaulloHaniellGalvaode_M.pdf: 1171322 bytes, checksum: 5488cfc9b843dbab6d7a5745af1e3d4b (MD5) Previous issue date: 2015 Resumo: Biclusterização envolve a clusterização simultânea de objetos e seus atributos, definindo mo- delos locais de relacionamento entre os objetos e seus atributos. Assim como a clusterização, a biclusterização tem uma vasta gama de aplicações, desde suporte a sistemas de recomendação, até análise de dados de expressão gênica. Inicialmente, diversas heurísticas foram propostas para encontrar biclusters numa base de dados numérica. No entanto, tais heurísticas apresen- tam alguns inconvenientes, como não encontrar biclusters relevantes na base de dados e não maximizar o volume dos biclusters encontrados. Algoritmos enumerativos são uma proposta recente, especialmente no caso de bases numéricas, cuja solução é um conjunto de biclusters maximais e não redundantes. Contudo, a habilidade de enumerar biclusters trouxe mais um cenário desafiador: em bases de dados ruidosas, cada bicluster original se fragmenta em vá- rios outros biclusters com alto nível de sobreposição, o que impede uma análise direta dos resultados obtidos. Essa fragmentação irá ocorrer independente da definição escolhida de co- erência interna no bicluster, sendo mais relacionada com o próprio nível de ruído. Buscando reverter essa fragmentação, nesse trabalho propomos duas formas de agregação de biclusters a partir de resultados que apresentem alto grau de sobreposição: uma baseada na clusteriza- ção hierárquica com single linkage, e outra explorando diretamente a taxa de sobreposição dos biclusters. Em seguida, um passo de poda é executado para remover objetos ou atributos indesejados que podem ter sido incluídos como resultado da agregação. As duas propostas foram comparadas entre si e com o estado da arte, em diversos experimentos, incluindo bases de dados artificiais e reais. Essas duas novas formas de agregação não só reduziram significa- tivamente a quantidade de biclusters, essencialmente defragmentando os biclusters originais, mas também aumentaram consistentemente a qualidade da solução, medida em termos de precisão e recuperação, quando os biclusters são conhecidos previamente Abstract: Biclustering involves the simultaneous clustering of objects and their attributes, thus defin- ing local models for the two-way relationship of objects and attributes. Just like clustering, biclustering has a broad set of applications, ranging from an advanced support for recom- mender systems of practical relevance to a decisive role in data mining techniques devoted to gene expression data analysis. Initially, heuristics have been proposed to find biclusters, and their main drawbacks are the possibility of losing some existing biclusters and the inca- pability of maximizing the volume of the obtained biclusters. Recently efficient algorithms were conceived to enumerate all the biclusters, particularly in numerical datasets, so that they compose a complete set of maximal and non-redundant biclusters. However, the ability to enumerate biclusters revealed a challenging scenario: in noisy datasets, each true bicluster becomes highly fragmented and with a high degree of overlapping, thus preventing a direct analysis of the obtained results. Fragmentation will happen no matter the boundary condi- tion adopted to specify the internal coherence of the valid biclusters, though the degree of fragmentation will be associated with the noise level. Aiming at reverting the fragmentation, we propose here two approaches for properly aggregating a set of biclusters exhibiting a high degree of overlapping: one based on single linkage and the other directly exploring the rate of overlapping. A pruning step is then employed to filter intruder objects and/or attributes that were added as a side effect of aggregation. Both proposals were compared with each other and also with the actual state-of-the-art in several experiments, including real and artificial datasets. The two newly-conceived aggregation mechanisms not only significantly reduced the number of biclusters, essentially defragmenting true biclusters, but also consistently in- creased the quality of the whole solution, measured in terms of Precision and Recall when the composition of the dataset is known a priori Mestrado Engenharia de Computação Mestre em Engenharia Elétrica

APA, Harvard, Vancouver, ISO, and other styles

32

Jarosch, Martin. "Klasifikace v proudu dat pomocí souboru klasifikátorů." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-235468.

Full text

Abstract:

This master's thesis deals with knowledge discovery and is focused on data stream classification. Three ensemble classification methods are described here. These methods are implemented in practical part of this thesis and are included in the classification system. Extensive measurements and experimentation were used for method analysis and comparison. Implemented methods were then integrated into Malware analysis system. At the conclusion are presented obtained results.

APA, Harvard, Vancouver, ISO, and other styles

33

Mašek, Martin. "Datové sklady - principy, metody návrhu, nástroje, aplikace, návrh konkrétního řešení." Master's thesis, Vysoká škola ekonomická v Praze, 2007. http://www.nusl.cz/ntk/nusl-10145.

Full text

Abstract:

The main goal of this thesis is to summarize and introduce general theoretical concepts of Data Warehousing by using the systems approach. The thesis defines Data Warehousing and its main areas and delimitates Data Warehousing area in terms of higher-level area called Business Intelligence. It also describes the history of Data Warehousing & Business Intelligence, focuses on key principals of Data Warehouse building and explains the practical applications of this solution. The aim of the practical part is to perform the evaluation of theoretical concepts. Based on that, design and build Data Warehouse in environment of an existing company. The final solution shall include Data Warehouse design, hardware and software platform selection, loading with real data by using ETL services and building of end users reports. The objective of the practical part is also to demonstrate the power of this technology and shall contribute to business decision-making process in this company.

APA, Harvard, Vancouver, ISO, and other styles

34

Pohl, Ondřej. "Analýza veřejně dostupných dat Českého statistického úřadu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-363884.

Full text

Abstract:

The aim of this thesis is analysis of data of the Czech Statistical Office concerning foreign trade. At first, reader familiarize with Business Intelligence and data warehousing. Further, OLAP analysis and data mining basics are explained. In next parts the thesis deal with describing and analysis of data of foreign trade by the help of OLAP technology and data mining in MS SQL Server including selected analytical tasks implementation.

APA, Harvard, Vancouver, ISO, and other styles

35

Arenas, Tawil Abraham José. "Mathematical modelling of virus RSV: qualitative properties, numerical solutions and validation for the case of the region of Valencia." Doctoral thesis, Universitat Politècnica de València, 2010. http://hdl.handle.net/10251/8316.

Full text

Abstract:

El objetivo de esta memoria se centra en primer lugar en la modelización del comportamiento de enfermedades estacionales mediante sistemas de ecuaciones diferenciales y en el estudio de las propiedades dinámicas tales como positividad, periocidad, estabilidad de las soluciones analíticas y la construcción de esquemas numéricos para las aproximaciones de las soluciones numéricas de sistemas de ecuaciones diferenciales de primer orden no lineales, los cuales modelan el comportamiento de enfermedades infecciosas estacionales tales como la transmisión del virus Respiratory Syncytial Virus (RSV). Se generalizan dos modelos matemáticos de enfermedades estacionales y se demuestran que tiene soluciones periódicas usando un Teorema de Coincidencia de Jean Mawhin. Para corroborar los resultados analíticos, se desarrollan esquemas numéricos usando las técnicas de diferencias finitas no estándar desarrolladas por Ronald Michens y el método de la transformada diferencial, los cuales permiten reproducir el comportamiento dinámico de las soluciones analíticas, tales como positividad y periocidad. Finalmente, las simulaciones numéricas se realizan usando los esquemas implementados y parámetros deducidos de datos clínicos De La Región de Valencia de personas infectadas con el virus RSV. Se confrontan con las que arrojan los métodos de Euler, Runge Kutta y la rutina de ODE45 de Matlab, verificándose mejores aproximaciones para tamaños de paso mayor a los que usan normalmente estos esquemas tradicionales. Arenas Tawil, AJ. (2009). Mathematical modelling of virus RSV: qualitative properties, numerical solutions and validation for the case of the region of Valencia [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8316 Palancia

APA, Harvard, Vancouver, ISO, and other styles

36

"Event Analytics on Social Media: Challenges and Solutions." Doctoral diss., 2014. http://hdl.handle.net/2286/R.I.27510.

Full text

Abstract:

abstract: Social media platforms such as Twitter, Facebook, and blogs have emerged as valuable - in fact, the de facto - virtual town halls for people to discover, report, share and communicate with others about various types of events. These events range from widely-known events such as the U.S Presidential debate to smaller scale, local events such as a local Halloween block party. During these events, we often witness a large amount of commentary contributed by crowds on social media. This burst of social media responses surges with the "second-screen" behavior and greatly enriches the user experience when interacting with the event and people's awareness of an event. Monitoring and analyzing this rich and continuous flow of user-generated content can yield unprecedentedly valuable information about the event, since these responses usually offer far more rich and powerful views about the event that mainstream news simply could not achieve. Despite these benefits, social media also tends to be noisy, chaotic, and overwhelming, posing challenges to users in seeking and distilling high quality content from that noise. In this dissertation, I explore ways to leverage social media as a source of information and analyze events based on their social media responses collectively. I develop, implement and evaluate EventRadar, an event analysis toolbox which is able to identify, enrich, and characterize events using the massive amounts of social media responses. EventRadar contains three automated, scalable tools to handle three core event analysis tasks: Event Characterization, Event Recognition, and Event Enrichment. More specifically, I develop ET-LDA, a Bayesian model and SocSent, a matrix factorization framework for handling the Event Characterization task, i.e., modeling characterizing an event in terms of its topics and its audience's response behavior (via ET-LDA), and the sentiments regarding its topics (via SocSent). I also develop DeMa, an unsupervised event detection algorithm for handling the Event Recognition task, i.e., detecting trending events from a stream of noisy social media posts. Last, I develop CrowdX, a spatial crowdsourcing system for handling the Event Enrichment task, i.e., gathering additional first hand information (e.g., photos) from the field to enrich the given event's context. Enabled by EventRadar, it is more feasible to uncover patterns that have not been explored previously and re-validating existing social theories with new evidence. As a result, I am able to gain deep insights into how people respond to the event that they are engaged in. The results reveal several key insights into people's various responding behavior over the event's timeline such the topical context of people's tweets does not always correlate with the timeline of the event. In addition, I also explore the factors that affect a person's engagement with real-world events on Twitter and find that people engage in an event because they are interested in the topics pertaining to that event; and while engaging, their engagement is largely affected by their friends' behavior. Dissertation/Thesis Doctoral Dissertation Computer Science 2014

APA, Harvard, Vancouver, ISO, and other styles

37

Yang, Yao-Ting, and 楊堯婷. "Analytical Customer Relationship Management in Data Mining of Hypermarket." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/73353795489824809049.

Full text

Abstract:

碩士 輔仁大學 應用統計學研究所 98 As extended global financial crisis activated by the bankruptcy of Lehman Brothers on September 15, 2008, consumers are affected, forced to tighten their purse strings and change their consuming behavior. Unlike many other impacted industries, hypermarket industry shows reverse growth on sales. This study focuses on following three major discussions. First, the process of quantified marketing strategy analysis is presented to the rarely studied hypermarket. Theories are combined with practical solutions utilized by the industry to develop an information-integrated framework for marketing strategy analysis. Second, the framework of analytical customer relationship management system in this marketing strategy analysis is then reviewed to introduce context solutions. And finally, the methodology of data mining is adopted to analyze customer transaction history in food department of a hypermarket. Concepts from segment marketing and cluster analysis are taken to divide customers into 4 groups and separately name these groups. A further usage of discriminant analysis is employed to observe some variables in distinguishing respected grouping. Then, different advices on marketing strategies for each group are proposed as references for hypermarket businesses in planning marketing activity, managing analytical customer relationship, and verifying the feasibility of analytical customer relationship management.

APA, Harvard, Vancouver, ISO, and other styles

38

Chen, Mei-Chun, and 陳美君. "Applying On-line Analytical Processing and Data Mining for Analyzing NetFlow Data." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/91920413850902657532.

Full text

Abstract:

碩士 國立交通大學 管理學院碩士在職專班資訊管理組 95 This study focuses on analyzing internet traffic using NetFlow technology. We use the OLAP to analyze flow traffic information and detect the real time network status of the network platform. This study aims to find the signature of network abnormal behavior through analyzing the historical netflow traffic information, which was incurred by the attack of CodeRed and MSBlast worm. Decision tree is applied to find the threshold of the abnormal network behavior. The threshold and techniques of the proposed analysis are implemented to detect the abnormal behavior of netflow traffic. Besides, experiments are conducted to verify the accuracy of the threshold.

APA, Harvard, Vancouver, ISO, and other styles

39

Oliveira, Vera. "Analytical Customer Relationship Management in Retailing Supported by Data Mining Techniques." Tese, 2012. http://hdl.handle.net/10216/74935.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Oliveira, Vera Lúcia Miguéis. "Analytical customer relationship management in retailing supported by data mining techniques." Doctoral thesis, 2012. http://hdl.handle.net/10216/69283.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Oliveira, Vera Lúcia Miguéis. "Analytical Customer Relationship Management in Retailing Supported by Data Mining Techniques." Doctoral thesis, 2012. https://repositorio-aberto.up.pt/handle/10216/64884.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Oliveira, Vera Lúcia Miguéis. "Analytical customer relationship management in retailing supported by data mining techniques." Tese, 2012. http://hdl.handle.net/10216/69283.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Huang, Ding-Jie, and 黃鼎傑. "Finding partial Pareto-optimal solutions using data mining and genetic algorithms." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/56655159341041560526.

Full text

Abstract:

碩士 國立中興大學 機械工程學系所 103 Data mining is applied to solve multi-objective optimization problems in this thesis. In order to get rules from data mining, Some sample points are placed in the design space using uniform design method.The values of objective functions of the sample points are then computed. Based on user’s needs, some specific objective intervals are selected and the corresponding design spaces are obtained by data mining techniques. Within the design spaces found, Genetic Algorithm (GA) is used to find pareto-optimal solutions. Using the method proposed in this thesis, there is no need to find the complete pareto front. Only useful solutions are generated.This is convenient and also saves computational time. In this thesis, the classification and clustering techniques in data mining are used to find the ranges of design variables that may generate objective values in the selected intervals through a small amount of sample points.Some problems including non-structural and structural design problems are used to test the idea of this thesis.The accuracy of solutions is analyzed. Many real-life optimization problems may contain discrete variables,or mixed-discrete variables and constraints.Different pareto fronts can be obtained due to different nature of design variables.

APA, Harvard, Vancouver, ISO, and other styles

44

Chang, Hsu-Wei, and 張書緯. "The Analytical Model Setup of Wafer Probing Overkill by Using Data Mining Technology." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/03551125754958910180.

Full text

Abstract:

碩士 國立中興大學 資訊科學與工程學系 105 Taiwan''s semiconductor manufacturing industry, after so many years of growth, in pursuit of lower production costs and higher profits, have a more detailed division of labor, it will most of the resources on the wafer manufacturing equipment machine, and Professional talent because of the different processes have differentiated, so the back-end of the wafer test commissioned to outsource the downstream packaging and testing industry, but also because of the technical general level lead to packaging and testing industry only need to buy a test machine and hire labor Mass production, in competition with each other, which a test industry can provide faster solutions and optimization processes in order to establish a competitive advantage. For testing business supplier , test materials for investigating raw materials, product quality, product quality Appropriate number of goods reasons correct? Absolutely None Minimum time Under the premise of precondition and time ministry time trial Measurement trial on each crystal on each crystal grain Truth and condition, competitor''s strategic competitiveness by each employer; misunderstanding what is wrongfully responsible Under the circumstances of the situation, unnecessary mandatory new heavy new reconsideration primary examination, conference formulation trial essential wasteful expenditure increase, simultaneous direct influence arrival customer ''s appointment degree Japanese census. Main theme Experimental trial investigative research purpose, investigative technique, investigative criminal trial misunderstanding phenomenon progression progressive regular construction Evidence of causal analysis, hypothetical cooperation agency ability misunderstanding evolutions timely immediate occurrence, term licensing Crystal erosion management examination direction, unnecessary section ministry wasteful expenditure, advancement essentials competitive strength. Factory time appraisal Measure the highest priority in the trial, failure ceremony rate Very high time, required multi - hour Troubleshooting (elimination) sum - up progressive measure, crowded city formation trial original loss.

APA, Harvard, Vancouver, ISO, and other styles

45

Ren, Ju Yuh, and 朱毓仁. "A Research by Using Digital Analysis and Data Mining As an Auditing Analytical Procedure." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/77125341662009027489.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Chang, Wen-Tu, and 張文圖. "A Study on Marketing Using on-line analytical processing and Data mining — with Telecommunication subscriber." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/85145739737983142085.

Full text

Abstract:

碩士 國立雲林科技大學 資訊管理系碩士班 90 The liberalization and privatization of the state-monopolized telecommunica-tions industry in Taiwan has become a salient trend. In order to compete or mai- ntain advantages in such a competitive environment, telecommunications com- panies are targeting at increasing their market shares by various market-ing strategies. The target marketing approach─segmenting customers into dif-ferent clusters and focusing product promotion on selected clusters─has been widely adopted. In this research, we focused on the data mining approach was applied to analyze the mass information. In the first stage, the Structure Query Language（SQL）was used to search disconnecting subscribers of the com-munication database. The repairman’s telephone interview and home-service testing were applied on this studies. Also, the OnLine Analytical Processing was used to analyze the characteristic of the low-connection. The results showed that it is related to the various degrees of the low-connection. Base on the studies, the improvement of the low-connection will increase both the con-fidence of repairman and target marketing. Subsequently, to find out specific customers with the identified characteristics, the link analysis technique was proposed. An empirical study was conducted to validate the effectiveness of the proposed approach. The result showed that the response rate of our target marketing strategy was more than three times of that of a mass marketing one. The contribution of this research include not only proposing an data mining approach for target marketing but also suggesting some data mining applica-tions for the telecommunication industry.

APA, Harvard, Vancouver, ISO, and other styles

47

Lien, Wei-Ling, and 連偉伶. "A study of house buying decision factors by applying data mining techniques and analytical hierarchy process." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/sxsx6n.

Full text

Abstract:

碩士 國立臺北科技大學 資訊與運籌管理研究所 100 In the real estate cycle study, the house buying decision making has been enormously investigated in recent years. Most literatures used regression models, factor analysis etc. to deal with historical real estate transaction data. However, as the Internet and computer technology booming affected significantly the framework of data collection, the transaction data in modern trading environment always contain millions (or more) records. Discovering potential interesting patterns in databases by statistical techniques such as regression model or factor analysis is indeed difficult. Obviously, developing an analytical technique for such enormous transaction data in databases is certainly taken into account. This study develops a two-phase approach to help make house buying decision. In phase one, we use the k-means algorithm, a well-known clustering method in data mining, to do customer-segmentation, then search for potential interesting patterns and house buying factors by associating rule mining. In phase two, the Analytical Hierarchy Process is used to explore house buying decisions with the factors discovered in phase one. This study intends to provide related house buying decision factors and its corresponding ranking for further applications. The real estate agents may check the conclusion of this study as the reference of decision making in real estate market.

APA, Harvard, Vancouver, ISO, and other styles

48

Wu, Pin-Hao, and 吳品豪. "Using data mining technique to search partial multi-objective optimal solutions in some designated objective ranges." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/45843117282731577363.

Full text

Abstract:

碩士 國立中興大學 機械工程學系所 102 Data mining is applied to solve multi-objective optimization problems in this thesis. In order to get rules from data mining, we put some sample points in the design space using uniform design method. The number of sample points is determined by the number of variables and the complexity of the functions. The values of objective functions of the sample points are then computed. Based on user’s demand, some specific objective intervals are selected. The classification and clustering techniques in data mining are used to find the ranges of design variables that may generate objective values in the selected intervals. To increase the accuracy of the ranges found, a second stage of classification and clustering is performed on the ranges found previously. Within the ranges found, one point is generated randomly as the initial point for solving multi-objective optimization problems. The sequential quadratic programming (SQP) is incorporated with the weighted sum method or compromise programming method to search the pareto-optimal solutions, and the solutions are expected to be in the selected objective intervals. The solutions obtained will be compared with the complete pareto fronts in related papers. Using the method proposed in this thesis, there is no need to find the complete pareto front. Only interested pareto solutions are found. This not only saves a lot of computational time but also satisfies the user’s need.

APA, Harvard, Vancouver, ISO, and other styles

49

Amorim, Inês Oliveira. "Analytical CRM in a management consulting firm : an application of data driven techniques." Master's thesis, 2021. http://hdl.handle.net/10400.14/34745.

Full text

Abstract:

Considering the competitive environment in which companies operate nowadays and the importance of customer relationship management (CRM), it is crucial to analyse customer-related data to gain knowledge and insights about them in order to increase their retention and company’s performance. The presented investigation resulted from a curricular internship carried out at Inova+, a management consulting firm specialised in supporting the growth of organizations. In this sense, the aim of this investigation is to support the CRM system and the customer’s management strategies of Inova+, contributing to the improvement and strengthening of relations between the company and its customers. For this purpose, a quantitative methodology using analytical tools, namely data mining tools, was adopted to study various dimensions of CRM. In this context, this investigation focused on four main aspects under analysis, which allowed to obtain a more detailed knowledge about the company's customers. Initially, the observation of KPIs regarding the CRM and the company's performance through the construction of dashboards. Secondly, a time-series forecasting model for prospective revenues was applied. Additionally, an identification of customer segments according to their purchasing behaviour through the application of a RFM model and a clustering analysis was carried out. Finally, significant factors that influence the probability of adjudication of a commercial proposal were identified, such as the country, type of organisation and economic sector of the client company, as well as the service associated. Considerando o ambiente competitivo em que as empresas operam atualmente e a importância do customer relationship management (CRM), é crucial analisar os dados relacionados com clientes para adquirir mais conhecimento e obter importantes insights sobre os mesmos, a fim de aumentar a sua retenção e o desempenho da empresa. A investigação apresentada resultou de um estágio curricular realizado na empresa Inova+, uma consultora especializada no apoio ao crescimento de organizações. Neste sentido, o objetivo desta investigação visa apoiar o sistema CRM e as estratégias de gestão de clientes da Inova+, contribuindo para a melhoria e fortalecimento das relações entre a empresa e os seus clientes. Para esse efeito, uma metodologia quantitativa utilizando ferramentas analíticas, nomeadamente ferramentas de data mining, foi adotada para estudar várias dimensões do CRM. Neste contexto, esta investigação focou-se em quatro aspetos principais em análise, que permitiram obter um conhecimento mais detalhado sobre os clientes da empresa. Inicialmente, a observação de KPIs relativos ao CRM e ao desempenho da empresa através da construção de dashboards. Em segundo lugar, foi aplicado um modelo de previsão de séries temporais relativo ao volume de negócios potencial. Adicionalmente, foram identificados segmentos de clientes de acordo com o seu comportamento de compra através da aplicação de um modelo RFM e foi desenvolvida uma análise de clustering. Por fim, foram identificados fatores significativos que influenciam a probabilidade de adjudicação de uma proposta comercial, tais como o país, tipo de organização e setor económico da empresa cliente, bem como o serviço associado.

APA, Harvard, Vancouver, ISO, and other styles

50

Schwarz, Holger [Verfasser]. "Integration von Data-Mining und online analytical processing : eine Analyse von Datenschemata, Systemarchitekturen und Optimierungsstrategien / vorgelegt von Holger Schwarz." 2003. http://d-nb.info/968816657/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Analytical solutions for data mining'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles